flowchart TD
A[Conceptualizer] --> |Preverbal Message| B[Formulator]
B --> |Grammatical Encoding| C[Mental Lexicon<br/>Lemmas]
C --> B
B --> |Phonological Encoding| D[Mental Lexicon<br/>Forms]
D --> B
B --> |Phonetic Plan| E[Articulator]
E --> |Overt Speech| F[OUTPUT]
E -.-> |Self-Monitoring| G[Speech Comprehension<br/>System]
G -.-> A
style A fill:#e1f5dd
style B fill:#d4e9f7
style C fill:#fff3cd
style D fill:#fff3cd
style E fill:#ffd4e5
style F fill:#f8d7da
style G fill:#e8e8e8
When Pictures Don’t Match Words: Using CLIP to Validate Experimental Stimuli
Prelude
What?
During my PhD, I’ve become fairly obsessed with production studies. I find them extremely interesting, especially the way they combine what we know from theoretical linguistics with creative experimental methods. Not to mention that the theoretical framing of production work is quite poor and the main theories used need a major overhaul. One of the most interesting papers in this area is Shota Momma’s work on advanced verb planning. Similar work was also done in German and Basque by Sebastian Sauppe’s group.
Let me set the scene. He and his colleagues ran multiple picture description experiments where participants saw images like: - “The octopus below the spoon is swimming” (unergative) - “The octopus below the spoon is boiling” (unaccusative)
If you’re not a syntax nerd, here’s the ultra-compressed version: verbs like “swim” and “bark” (unergatives) are different from verbs like “sink” and “melt” (unaccusatives), even though they both describe single-argument events. The difference has to do with argument structure—where the subject comes from in the underlying syntax. It has been argued that the subjects of unaccusatives are actually ‘deep objects’ for lack of a better term, and they structurally start in the same position as any other object.
They showed that these two verb types behave differently in production experiments. Speakers plan them differently. They tested this by showing related or unrelated words superimposed on the pictures. They found that when the verbs were related, participants slowed down before they started speaking—but only with unaccusatives. His theoretical claim was that unaccusative verbs are planned earlier in the sentence production process—possibly right at the beginning, along with the subject.
Why though?
Here’s the thing. Another thing that made me very excited about the production endeavor is that there are probably so many possible confounds that require checking. And I love this song and dance in psycholinguistics, where I can stress-test findings and see how stable they are. It’s especially important when you find an unexpected result—like participants taking longer to start speaking when they won’t say the verb for at least 3 more seconds—my first instinct, and I hope yours, is to wonder: “Is this real, or is something else going on?”
This post is built on a very specific worry: What if unaccusative scenes themselves, and not the syntax of them, created the results? One interesting finding in Shota Momma’s papers was that unergative planning was seemingly invisible. He has shown that there are reasons to believe that it happens while saying the second NP. But quantitatively, the signature of unergative planning seems to be more dissolved throughout the sentence, while the unaccusative planning is strikingly clear.
This creates the following question: is it possible that participants, simply because the picture was more difficult to understand or the subject was more involved in the action, spent more time initially to either understand the event or to extract the subject from the event, and during this time a deterministic analysis of the written word kicked in and slowed them down when it was related? Since the unergative subjects are more easily dissociable from the event, since nothing is happening to them in those pictures, it takes less time, and since it’s less of a resource-heavy process, no additional process interferes with it. This has several predictions. First, in follow-up experiments where the unergative pictures are hard to ‘retrieve’ from the scene, one should see similar onset effects. Second, if there is some sort of picture-difficulty metric, the advance planning should align with that metric item-wise.
The second prediction is going to be the basis of this blog post, where we will find a way to quantify the picture difficulty.
I make assumptions
I assume the following ‘two-way’ distinction with respect to lexical verbs. However, one needs to admit that unaccusativity is not stable all the time. Many such unaccusative verbs can be used as unergatives given some adverbial modification or different contexts. This would create some minor infelicity in English, but that is not the case for many languages. For example, Laz can make any verb ‘agentive’ with a small prefix. Imagine a Laz-type English where you have “I cried” vs. “I do-cried,” where the second one means that you made yourself cry or you deliberately cried. Or a better example might be: imagine if English “jump” were decomposable into a prefix “do-” and “fall.” So, for now I only assume that these properties are lexical properties of the verb, but one needs to admit that these are event-related ones.
- Unergative actions (swimming, barking, running): The action is performed by the agent. You can see the octopus swimming—the action is somewhat separable from what happens to the entity.
- Unaccusative actions (boiling, melting, sinking): Something is happening to the entity. The octopus isn’t “doing” boiling—it’s undergoing a change of state. The action and the entity are less separable.
Another assumption I make is about CLIP/VLM. The input that CLIP takes is a written sentence and a picture. I am fully aware that the way CLIP assesses pictures is nowhere near how humans do.1 I am also aware that in human speech, the scenes are what is encoded and the speech is the decoding. CLIP works differently. CLIP is a two-encoder model. Given two inputs of a picture and a text, it creates two separate vectors and checks how similar those vectors are. Thus, it does not give us anything about human cognition. But it gives us a way to quantify relevant metrics. Below what I assume to be the models of human speech production based on Levelt’s work and CLIP’s architecture.
Levelt’s Speech Production Model:
CLIP Architecture:
flowchart TD
A[Picture] --> B[Image Encoder]
C[Text] --> D[Text Encoder]
B --> E[Image Embedding]
D --> F[Text Embedding]
E --> G[Similarity Score]
F --> G
style A fill:#e1f5dd
style C fill:#e1f5dd
style B fill:#d4e9f7
style D fill:#d4e9f7
style E fill:#fff3cd
style F fill:#fff3cd
style G fill:#f8d7da
Multimodal LLMs:
More recently, multimodal large language models have emerged that work quite differently from CLIP. Instead of creating separate embeddings and comparing them, these models integrate visual and textual information into a unified representation and can generate natural language descriptions or answers about images.
I have to say, writing their code is also a bit funny. You basically have to build a pipeline where you create a ‘chat template’ and ask them to give you an output. I am not sure that is how you are supposed to use them, but it works.2
Models like Qwen3-Omni take both images and text as input, process them through vision encoders and language models together, and generate coherent text outputs. Unlike CLIP’s similarity metric, multimodal LLMs can provide richer, more nuanced interpretations of visual scenes and answer complex questions about them. We will use both of them and compare here.
flowchart TD
A[Picture] --> B[Vision Encoder]
C[Text Prompt] --> D[Tokenizer]
B --> E[Visual Tokens]
D --> F[Text Tokens]
E --> G[Unified LLM]
F --> G
G --> H[Generated Text Output]
style A fill:#e1f5dd
style C fill:#e1f5dd
style B fill:#d4e9f7
style D fill:#d4e9f7
style E fill:#fff3cd
style F fill:#fff3cd
style G fill:#ffd4e5
style H fill:#f8d7da
Lastly, these experiments were conducted as a extended-PWI experiment, where participants were provided with a picture with a superimposed text on it. Neither the pictures, nor the tasks I improvise here does not have any relation to picture word interference task. It would be indeed interesting if we have an understanding how PWI would look like interms of LLM tasks. However it is far from what I would like to achieve here. If I have that idea I will probably submit a paper or an abstract somewhere :).
Predictions
If unaccusative actions (like “boiling” or “melting”) are genuinely harder to see in pictures, or if the subjects are harder to visually identify in the scenes, we’d expect: - Lower similarity scores between the images and their target sentences - Evidence that models struggle to “ground” the sentence/entity in the visual input, in the form of subject saliency.
If that’s the case, we have a problem—the onset latency effect might just be about picture difficulty.3
But if the similarity scores are comparable or higher for unaccusatives, then we can rule out the perceptual confound for now and be more confident that the effects reflect genuine linguistic processing.
Model Base
CLIP
CLIP (Contrastive Language-Image Pre-training) is a neural network trained on 400 million image-text pairs from the internet. It learns to match images with their corresponding text descriptions by projecting both into a shared embedding space.
Setting Up
Let’s start by loading the packages we’ll need. I’m going to build this up step by step, just like I did when I first ran this analysis.
import os
import torch
import clip
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
# Set up plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)First, we need to load the CLIP model. I’m using the ViT-B/32 variant, which is a good balance between performance and computational efficiency:
# Load two decoder CLIP model
# Note: We use CPU for everything if MPS is detected to avoid moondream2 issues
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
device = "cpu"
else:
device = "cpu"
model_clip, preprocess = clip.load("ViT-B/32", device=device, jit=False)
print(f"Using device: {device}")
print(f"CLIP model loaded successfully!")Now let’s also load a multimodal LLM for comparison. We’ll use Qwen-VL-Chat, a powerful vision-language model:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import transformers
import torch
from transformers.generation.beam_search import BeamSearchScorer
transformers.BeamSearchScorer = BeamSearchScorer
# Load Qwen-VL-Chat model
model_id = "Qwen/Qwen-VL-Chat"
model_vlm = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
dtype=torch.float32
).to('cpu')
tokenizer_vlm = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Create the streamer
streamer = TextStreamer(tokenizer_vlm, skip_prompt=True)The Data Structure
My experimental materials consist of 24 scenes: - 12 unergative scenes (swimming, running, barking, etc.) - 12 unaccusative scenes (boiling, shrinking, sinking, etc.)
Each scene pairs a character (octopus, ballerina, chef, etc.) with an action. Let’s create a dataframe with our materials:
# Unergative scenes
df_unerg = pd.DataFrame({
"Filename": [
"./pictures/octopus_swim.jpg",
"./pictures/ballerina_run.jpg",
"./pictures/boy_float.jpg",
"./pictures/chef_yell.jpg",
"./pictures/clown_walk.jpg",
"./pictures/cowboy_wink.jpg",
"./pictures/dog_bark.jpg",
"./pictures/monkey_sleep.jpg",
"./pictures/penguin_sneeze.jpg",
"./pictures/pirate_cough.jpg",
"./pictures/rabbit_smile.jpg",
"./pictures/snail_crawl.jpg",
],
"Sentence": [
"The octopus is swimming.",
"The ballerina is running.",
"The boy is floating.",
"The chef is yelling.",
"The clown is walking.",
"The cowboy is winking.",
"The dog is barking.",
"The monkey is sleeping.",
"The penguin is sneezing.",
"The pirate is coughing.",
"The rabbit is smiling.",
"The snail is crawling.",
]
})
# Unaccusative scenes
df_unacc = pd.DataFrame({
"Filename": [
"./pictures/octopus_boil.jpg",
"./pictures/ballerina_shrink.jpg",
"./pictures/boy_yawn.jpg",
"./pictures/chef_drown.jpg",
"./pictures/clown_grow.jpg",
"./pictures/cowboy_fall.jpg",
"./pictures/dog_spin.jpg",
"./pictures/monkey_trip.jpg",
"./pictures/penguin_bounce.jpg",
"./pictures/pirate_sink.jpg",
"./pictures/rabbit_shake.jpg",
"./pictures/snail_melt.jpg",
],
"Sentence": [
"The octopus is boiling.",
"The ballerina is shrinking.",
"The boy is yawning.",
"The chef is drowning.",
"The clown is growing.",
"The cowboy is falling.",
"The dog is spinning.",
"The monkey is tripping.",
"The penguin is bouncing.",
"The pirate is sinking.",
"The rabbit is shaking.",
"The snail is melting.",
]
})Computing Similarity Scores
Now for the main event. For each image-sentence pair, we’ll compute CLIP’s similarity score. This tells us how well the model thinks the image matches the text.
def compute_clip_similarity(df, model, preprocess, device):
"""
Compute CLIP similarity scores for image-text pairs.
Parameters:
-----------
df : pandas.DataFrame
DataFrame with 'Filename' and 'Sentence' columns
model : CLIP model
Loaded CLIP model
preprocess : function
CLIP preprocessing function
device : str
'cuda' or 'cpu'
Returns:
--------
pandas.DataFrame
Original dataframe with added 'CLIP_Similarity' column
"""
similarity_scores = []
for _, row in df.iterrows():
img_path = row['Filename']
text = row['Sentence']
# Preprocess image and tokenize text
img = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
text_tokenized = clip.tokenize([text]).to(device)
# Compute similarity
with torch.no_grad():
logits_per_image, _ = model(img, text_tokenized)
similarity_score = logits_per_image.item()
similarity_scores.append(similarity_score)
# Add scores to dataframe
df_copy = df.copy()
df_copy['CLIP_Similarity'] = similarity_scores
return df_copy
def compute_subject_salience(df, model, preprocess, device):
"""
Compute CLIP similarity scores for subject noun alone.
This measures how visually salient/easy to identify the subject is.
Parameters:
-----------
df : pandas.DataFrame
DataFrame with 'Filename' and 'Sentence' columns
model : CLIP model
Loaded CLIP model
preprocess : function
CLIP preprocessing function
device : str
'cuda' or 'cpu'
Returns:
--------
pandas.DataFrame
Original dataframe with added 'Subject_Salience' column
"""
subject_scores = []
for _, row in df.iterrows():
img_path = row['Filename']
sentence = row['Sentence']
# Extract subject noun (assumes format "The X is ...")
# Extract word after "The " and before " is"
subject = sentence.split("The ")[1].split(" is")[0]
# Preprocess image and tokenize subject
img = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
text_tokenized = clip.tokenize([subject]).to(device)
# Compute similarity
with torch.no_grad():
logits_per_image, _ = model(img, text_tokenized)
similarity_score = logits_per_image.item()
subject_scores.append(similarity_score)
df_copy = df.copy()
df_copy['Subject_Salience'] = subject_scores
return df_copyWe can also use a multimodal LLM to verify the image-sentence match in a different way. Instead of computing similarity scores, we’ll ask the model to rate how well the sentence describes the image:
def compute_qwen_scores(df, model, tokenizer, streamer=None):
"""
Compute verification scores using Qwen-VL-Chat multimodal LLM.
Parameters:
-----------
df : pandas.DataFrame
DataFrame with 'Filename' and 'Sentence' columns
model : Qwen-VL-Chat model
Loaded Qwen model
tokenizer : AutoTokenizer
Qwen tokenizer
streamer : TextStreamer, optional
Streamer for real-time output
Returns:
--------
pandas.DataFrame
Original dataframe with added 'VLM_Score' and 'VLM_Response' columns
"""
import re
scores = []
responses = []
for idx, row in df.iterrows():
img_path = row['Filename']
sentence = row['Sentence']
# Create query for Qwen-VL-Chat
query = tokenizer.from_list_format([
{'image': img_path},
{'text': f'Rate how well this sentence describes the image: "{sentence}"\nScore from 1-10 (1=mismatch, 10=perfect match). Reply with just the number.'},
])
# Generate response
with torch.no_grad():
response, _ = model.chat(tokenizer, query=query, history=None, streamer=streamer)
# Extract numeric score
try:
match = re.search(r'(\d+(?:\.\d+)?)', response)
score = float(match.group(1)) if match else 5.0
score = min(10.0, max(1.0, score)) # Clamp to 1-10
except:
score = 5.0
scores.append(score)
responses.append(response)
df_copy = df.copy()
df_copy['VLM_Score'] = scores
df_copy['VLM_Response'] = responses
return df_copyLet’s run this on both datasets. To avoid re-computing the slow VLM scores on every render, we cache results to a CSV file:
import os
CACHE_FILE = "./cached_scores.csv"
if os.path.exists(CACHE_FILE):
df_all = pd.read_csv(CACHE_FILE)
else:
# Compute CLIP similarities
df_unerg_clip = compute_clip_similarity(df_unerg, model_clip, preprocess, device)
df_unacc_clip = compute_clip_similarity(df_unacc, model_clip, preprocess, device)
# Compute subject salience scores
df_unerg_subj = compute_subject_salience(df_unerg, model_clip, preprocess, device)
df_unacc_subj = compute_subject_salience(df_unacc, model_clip, preprocess, device)
# Compute Qwen-VL scores
df_unerg_vlm = compute_qwen_scores(df_unerg, model_vlm, tokenizer_vlm, streamer=streamer)
df_unacc_vlm = compute_qwen_scores(df_unacc, model_vlm, tokenizer_vlm, streamer=streamer)
# Combine CLIP scores with VLM scores and subject salience
df_unerg_scored = df_unerg_clip.copy()
df_unerg_scored['Subject_Salience'] = df_unerg_subj['Subject_Salience']
df_unerg_scored['VLM_Score'] = df_unerg_vlm['VLM_Score']
df_unerg_scored['VLM_Response'] = df_unerg_vlm['VLM_Response']
df_unerg_scored['VerbType'] = 'Unergative'
df_unacc_scored = df_unacc_clip.copy()
df_unacc_scored['Subject_Salience'] = df_unacc_subj['Subject_Salience']
df_unacc_scored['VLM_Score'] = df_unacc_vlm['VLM_Score']
df_unacc_scored['VLM_Response'] = df_unacc_vlm['VLM_Response']
df_unacc_scored['VerbType'] = 'Unaccusative'
# Combine for analysis
df_all = pd.concat([df_unerg_scored, df_unacc_scored], ignore_index=True)
# Save to cache
df_all.to_csv(CACHE_FILE, index=False)
print(df_all.head()) Filename Sentence CLIP_Similarity \
0 ./octopus_swim.jpg The octopus is swimming. 29.137495
1 ./ballerina_run.jpg The ballerina is running. 27.731918
2 ./boy_float.jpg The boy is floating. 20.843243
3 ./chef_yell.jpg The chef is yelling. 27.878561
4 ./clown_walk.jpg The clown is walking. 27.077477
Subject_Salience VLM_Score VLM_Response VerbType
0 28.454519 8.0 8 Unergative
1 25.250607 7.0 7 Unergative
2 21.628622 1.0 1 Unergative
3 28.490120 8.0 8 Unergative
4 26.241133 8.0 8 Unergative
Descriptive Results
Let’s start by looking at the descriptive statistics across all three metrics:
# Create comparison plot with all three metrics
fig, axes = plt.subplots(1, 3, figsize=(20, 6))
# CLIP full sentence results
sns.pointplot(data=df_all, x='VerbType', y='CLIP_Similarity',
hue='VerbType', palette=['#3498db', '#e74c3c'],
ax=axes[0], errorbar='ci', capsize=0.1,
linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='CLIP_Similarity',
color='black', alpha=0.5, size=8, ax=axes[0], jitter=0.2)
axes[0].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[0].set_ylabel('CLIP Similarity Score', fontsize=14, fontweight='bold')
axes[0].set_title('Full Sentence Similarity',
fontsize=16, fontweight='bold', pad=20)
for verb_type in ['Unergative', 'Unaccusative']:
mean_val = df_all[df_all['VerbType'] == verb_type]['CLIP_Similarity'].mean()
axes[0].text(0 if verb_type == 'Unergative' else 1, mean_val + 1,
f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')
# Subject salience results
sns.pointplot(data=df_all, x='VerbType', y='Subject_Salience',
hue='VerbType', palette=['#3498db', '#e74c3c'],
ax=axes[1], errorbar='ci', capsize=0.1,
linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='Subject_Salience',
color='black', alpha=0.5, size=8, ax=axes[1], jitter=0.2)
axes[1].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Subject Salience Score', fontsize=14, fontweight='bold')
axes[1].set_title('Subject Noun Identifiability',
fontsize=16, fontweight='bold', pad=20)
for verb_type in ['Unergative', 'Unaccusative']:
mean_val = df_all[df_all['VerbType'] == verb_type]['Subject_Salience'].mean()
axes[1].text(0 if verb_type == 'Unergative' else 1, mean_val + 0.5,
f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')
# VLM results
sns.pointplot(data=df_all, x='VerbType', y='VLM_Score',
hue='VerbType', palette=['#3498db', '#e74c3c'],
ax=axes[2], errorbar='ci', capsize=0.1,
linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='VLM_Score',
color='black', alpha=0.5, size=8, ax=axes[2], jitter=0.2)
axes[2].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[2].set_ylabel('Qwen-VL Match Score (1-10)', fontsize=14, fontweight='bold')
axes[2].set_title('Scene Verification (Qwen-VL)',
fontsize=16, fontweight='bold', pad=20)
for verb_type in ['Unergative', 'Unaccusative']:
mean_val = df_all[df_all['VerbType'] == verb_type]['VLM_Score'].mean()
axes[2].text(0 if verb_type == 'Unergative' else 1, mean_val + 0.3,
f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.savefig('./model_comparison_plot.png', dpi=300, bbox_inches='tight')
plt.show()Bayesian Analysis
To get a better sense of the uncertainty around these differences, I ran a Bayesian regression using Pyro. The idea is simple: instead of just looking at means, we can model the effect of verb type on each metric and get full posterior distributions.
import torch
import pyro
import pyro.distributions as dist
from pyro.infer import MCMC, NUTS
# Prepare data for Pyro
df_pyro = df_all.copy()
df_pyro['VerbType_num'] = df_pyro['VerbType'].map({'Unergative': -0.5, 'Unaccusative': 0.5})
df_pyro['CLIP_centered'] = df_pyro['CLIP_Similarity'] - df_pyro['CLIP_Similarity'].mean()
df_pyro['Subject_centered'] = df_pyro['Subject_Salience'] - df_pyro['Subject_Salience'].mean()
# Convert to tensors
verb_type_tensor = torch.tensor(df_pyro['VerbType_num'].values, dtype=torch.float32)
clip_tensor = torch.tensor(df_pyro['CLIP_centered'].values, dtype=torch.float32)
subject_tensor = torch.tensor(df_pyro['Subject_centered'].values, dtype=torch.float32)
# Model for CLIP similarity
def clip_model(verb_type, obs=None):
# Priors
intercept = pyro.sample('intercept', dist.Normal(0., 10.))
beta = pyro.sample('beta', dist.Normal(0., 10.))
sigma = pyro.sample('sigma', dist.HalfNormal(10.))
# Linear model
mu = intercept + beta * verb_type
# Likelihood
with pyro.plate('data', len(verb_type)):
pyro.sample('obs', dist.Normal(mu, sigma), obs=obs)
# Run MCMC for CLIP
nuts_kernel = NUTS(clip_model)
mcmc_clip = MCMC(nuts_kernel, num_samples=2000, warmup_steps=1000)
mcmc_clip.run(verb_type_tensor, clip_tensor)
# Get posterior samples
clip_samples = mcmc_clip.get_samples()
clip_beta_mean = clip_samples['beta'].mean().item()
clip_beta_hdi = torch.quantile(clip_samples['beta'], torch.tensor([0.025, 0.975]))
print(f"\nCLIP Similarity - Bayesian Regression:")
print(f" Beta (VerbType effect): {clip_beta_mean:.3f}")
print(f" 95% HDI: [{clip_beta_hdi[0]:.3f}, {clip_beta_hdi[1]:.3f}]")
print(f" P(beta < 0): {(clip_samples['beta'] < 0).float().mean():.3f}")Warmup: 0%| | 0/3000 [00:00, ?it/s]Warmup: 0%| | 4/3000 [00:00, 32.47it/s, step size=4.97e-02, acc. prob=0.457]Warmup: 0%| | 15/3000 [00:00, 73.55it/s, step size=1.63e-01, acc. prob=0.741]Warmup: 1%| | 28/3000 [00:00, 97.90it/s, step size=5.25e-02, acc. prob=0.750]Warmup: 1%|▏ | 39/3000 [00:00, 81.05it/s, step size=2.57e-01, acc. prob=0.777]Warmup: 2%|▏ | 53/3000 [00:00, 97.87it/s, step size=9.77e-02, acc. prob=0.774]Warmup: 2%|▏ | 64/3000 [00:00, 97.00it/s, step size=1.43e-01, acc. prob=0.779]Warmup: 2%|▎ | 75/3000 [00:00, 96.34it/s, step size=2.94e-01, acc. prob=0.786]Warmup: 3%|▎ | 85/3000 [00:00, 89.58it/s, step size=1.10e-01, acc. prob=0.781]Warmup: 3%|▎ | 97/3000 [00:01, 96.88it/s, step size=1.31e-01, acc. prob=0.783]Warmup: 4%|▍ | 117/3000 [00:01, 124.88it/s, step size=1.26e+00, acc. prob=0.779]Warmup: 5%|▍ | 138/3000 [00:01, 147.78it/s, step size=7.69e-01, acc. prob=0.780]Warmup: 6%|▌ | 165/3000 [00:01, 182.00it/s, step size=1.14e+00, acc. prob=0.776]Warmup: 6%|▌ | 185/3000 [00:01, 184.30it/s, step size=6.34e-01, acc. prob=0.776]Warmup: 7%|▋ | 217/3000 [00:01, 223.31it/s, step size=7.27e-01, acc. prob=0.778]Warmup: 8%|▊ | 248/3000 [00:01, 246.55it/s, step size=5.66e-01, acc. prob=0.779]Warmup: 9%|▉ | 273/3000 [00:01, 246.08it/s, step size=1.49e-01, acc. prob=0.776]Warmup: 10%|▉ | 298/3000 [00:01, 246.47it/s, step size=6.43e-01, acc. prob=0.779]Warmup: 11%|█ | 336/3000 [00:01, 285.20it/s, step size=4.88e-01, acc. prob=0.780]Warmup: 12%|█▏ | 367/3000 [00:02, 291.84it/s, step size=3.40e-01, acc. prob=0.780]Warmup: 13%|█▎ | 404/3000 [00:02, 314.49it/s, step size=5.58e-01, acc. prob=0.782]Warmup: 15%|█▍ | 440/3000 [00:02, 326.58it/s, step size=1.57e+00, acc. prob=0.785]Warmup: 16%|█▌ | 473/3000 [00:02, 308.67it/s, step size=2.45e+00, acc. prob=0.783]Warmup: 17%|█▋ | 511/3000 [00:02, 325.35it/s, step size=8.23e-01, acc. prob=0.783]Warmup: 18%|█▊ | 551/3000 [00:02, 345.40it/s, step size=1.22e+00, acc. prob=0.784]Warmup: 20%|█▉ | 586/3000 [00:02, 331.09it/s, step size=9.93e-01, acc. prob=0.785]Warmup: 21%|██ | 620/3000 [00:02, 330.31it/s, step size=9.61e-01, acc. prob=0.785]Warmup: 22%|██▏ | 657/3000 [00:02, 339.58it/s, step size=1.02e+00, acc. prob=0.786]Warmup: 23%|██▎ | 701/3000 [00:03, 366.34it/s, step size=9.24e-01, acc. prob=0.786]Warmup: 25%|██▍ | 738/3000 [00:03, 361.13it/s, step size=7.22e-01, acc. prob=0.787]Warmup: 26%|██▌ | 777/3000 [00:03, 368.19it/s, step size=7.79e-01, acc. prob=0.787]Warmup: 27%|██▋ | 814/3000 [00:03, 360.13it/s, step size=9.90e-01, acc. prob=0.788]Warmup: 28%|██▊ | 851/3000 [00:03, 358.07it/s, step size=1.00e+00, acc. prob=0.788]Warmup: 30%|██▉ | 899/3000 [00:03, 393.17it/s, step size=9.04e-01, acc. prob=0.789]Warmup: 31%|███▏ | 940/3000 [00:03, 397.80it/s, step size=9.55e-01, acc. prob=0.789]Warmup: 33%|███▎ | 980/3000 [00:03, 321.15it/s, step size=1.68e-01, acc. prob=0.788]Sample: 34%|███▍ | 1015/3000 [00:03, 302.62it/s, step size=6.88e-01, acc. prob=0.926]Sample: 35%|███▍ | 1048/3000 [00:04, 298.84it/s, step size=6.88e-01, acc. prob=0.937]Sample: 36%|███▌ | 1080/3000 [00:04, 298.13it/s, step size=6.88e-01, acc. prob=0.934]Sample: 37%|███▋ | 1112/3000 [00:04, 301.75it/s, step size=6.88e-01, acc. prob=0.925]Sample: 38%|███▊ | 1143/3000 [00:04, 287.94it/s, step size=6.88e-01, acc. prob=0.924]Sample: 39%|███▉ | 1174/3000 [00:04, 291.31it/s, step size=6.88e-01, acc. prob=0.922]Sample: 40%|████ | 1205/3000 [00:04, 294.54it/s, step size=6.88e-01, acc. prob=0.927]Sample: 41%|████ | 1235/3000 [00:04, 289.96it/s, step size=6.88e-01, acc. prob=0.929]Sample: 42%|████▏ | 1267/3000 [00:04, 297.89it/s, step size=6.88e-01, acc. prob=0.930]Sample: 43%|████▎ | 1299/3000 [00:04, 304.24it/s, step size=6.88e-01, acc. prob=0.929]Sample: 44%|████▍ | 1330/3000 [00:05, 295.80it/s, step size=6.88e-01, acc. prob=0.929]Sample: 45%|████▌ | 1360/3000 [00:05, 291.83it/s, step size=6.88e-01, acc. prob=0.928]Sample: 46%|████▋ | 1392/3000 [00:05, 296.97it/s, step size=6.88e-01, acc. prob=0.928]Sample: 47%|████▋ | 1423/3000 [00:05, 297.77it/s, step size=6.88e-01, acc. prob=0.929]Sample: 48%|████▊ | 1455/3000 [00:05, 301.00it/s, step size=6.88e-01, acc. prob=0.929]Sample: 50%|████▉ | 1487/3000 [00:05, 305.68it/s, step size=6.88e-01, acc. prob=0.929]Sample: 51%|█████ | 1521/3000 [00:05, 313.67it/s, step size=6.88e-01, acc. prob=0.929]Sample: 52%|█████▏ | 1553/3000 [00:05, 314.45it/s, step size=6.88e-01, acc. prob=0.930]Sample: 53%|█████▎ | 1586/3000 [00:05, 315.72it/s, step size=6.88e-01, acc. prob=0.928]Sample: 54%|█████▍ | 1618/3000 [00:05, 308.01it/s, step size=6.88e-01, acc. prob=0.928]Sample: 55%|█████▍ | 1649/3000 [00:06, 296.85it/s, step size=6.88e-01, acc. prob=0.929]Sample: 56%|█████▌ | 1682/3000 [00:06, 305.37it/s, step size=6.88e-01, acc. prob=0.929]Sample: 57%|█████▋ | 1714/3000 [00:06, 308.15it/s, step size=6.88e-01, acc. prob=0.929]Sample: 58%|█████▊ | 1745/3000 [00:06, 302.61it/s, step size=6.88e-01, acc. prob=0.929]Sample: 59%|█████▉ | 1777/3000 [00:06, 307.52it/s, step size=6.88e-01, acc. prob=0.929]Sample: 60%|██████ | 1808/3000 [00:06, 300.28it/s, step size=6.88e-01, acc. prob=0.928]Sample: 61%|██████▏ | 1839/3000 [00:06, 300.85it/s, step size=6.88e-01, acc. prob=0.928]Sample: 62%|██████▏ | 1871/3000 [00:06, 303.93it/s, step size=6.88e-01, acc. prob=0.928]Sample: 63%|██████▎ | 1904/3000 [00:06, 308.33it/s, step size=6.88e-01, acc. prob=0.928]Sample: 65%|██████▍ | 1936/3000 [00:07, 309.34it/s, step size=6.88e-01, acc. prob=0.928]Sample: 66%|██████▌ | 1970/3000 [00:07, 317.13it/s, step size=6.88e-01, acc. prob=0.928]Sample: 67%|██████▋ | 2002/3000 [00:07, 317.41it/s, step size=6.88e-01, acc. prob=0.928]Sample: 68%|██████▊ | 2034/3000 [00:07, 309.67it/s, step size=6.88e-01, acc. prob=0.928]Sample: 69%|██████▉ | 2066/3000 [00:07, 297.96it/s, step size=6.88e-01, acc. prob=0.928]Sample: 70%|██████▉ | 2096/3000 [00:07, 296.74it/s, step size=6.88e-01, acc. prob=0.929]Sample: 71%|███████ | 2126/3000 [00:07, 295.04it/s, step size=6.88e-01, acc. prob=0.929]Sample: 72%|███████▏ | 2156/3000 [00:07, 285.88it/s, step size=6.88e-01, acc. prob=0.929]Sample: 73%|███████▎ | 2188/3000 [00:07, 292.49it/s, step size=6.88e-01, acc. prob=0.929]Sample: 74%|███████▍ | 2219/3000 [00:07, 293.85it/s, step size=6.88e-01, acc. prob=0.929]Sample: 75%|███████▍ | 2249/3000 [00:08, 292.71it/s, step size=6.88e-01, acc. prob=0.929]Sample: 76%|███████▌ | 2279/3000 [00:08, 293.76it/s, step size=6.88e-01, acc. prob=0.929]Sample: 77%|███████▋ | 2312/3000 [00:08, 302.23it/s, step size=6.88e-01, acc. prob=0.929]Sample: 78%|███████▊ | 2343/3000 [00:08, 303.38it/s, step size=6.88e-01, acc. prob=0.929]Sample: 79%|███████▉ | 2374/3000 [00:08, 298.97it/s, step size=6.88e-01, acc. prob=0.929]Sample: 80%|████████ | 2406/3000 [00:08, 302.90it/s, step size=6.88e-01, acc. prob=0.930]Sample: 81%|████████▏ | 2439/3000 [00:08, 309.28it/s, step size=6.88e-01, acc. prob=0.929]Sample: 82%|████████▏ | 2470/3000 [00:08, 298.09it/s, step size=6.88e-01, acc. prob=0.929]Sample: 83%|████████▎ | 2500/3000 [00:08, 293.84it/s, step size=6.88e-01, acc. prob=0.929]Sample: 84%|████████▍ | 2530/3000 [00:09, 291.21it/s, step size=6.88e-01, acc. prob=0.928]Sample: 85%|████████▌ | 2560/3000 [00:09, 292.25it/s, step size=6.88e-01, acc. prob=0.928]Sample: 86%|████████▋ | 2590/3000 [00:09, 294.20it/s, step size=6.88e-01, acc. prob=0.928]Sample: 87%|████████▋ | 2620/3000 [00:09, 289.81it/s, step size=6.88e-01, acc. prob=0.928]Sample: 88%|████████▊ | 2650/3000 [00:09, 291.95it/s, step size=6.88e-01, acc. prob=0.928]Sample: 89%|████████▉ | 2683/3000 [00:09, 300.65it/s, step size=6.88e-01, acc. prob=0.928]Sample: 90%|█████████ | 2715/3000 [00:09, 304.61it/s, step size=6.88e-01, acc. prob=0.927]Sample: 92%|█████████▏| 2749/3000 [00:09, 314.80it/s, step size=6.88e-01, acc. prob=0.927]Sample: 93%|█████████▎| 2781/3000 [00:09, 313.47it/s, step size=6.88e-01, acc. prob=0.927]Sample: 94%|█████████▍| 2813/3000 [00:09, 298.95it/s, step size=6.88e-01, acc. prob=0.928]Sample: 95%|█████████▍| 2845/3000 [00:10, 301.32it/s, step size=6.88e-01, acc. prob=0.927]Sample: 96%|█████████▌| 2878/3000 [00:10, 306.82it/s, step size=6.88e-01, acc. prob=0.927]Sample: 97%|█████████▋| 2909/3000 [00:10, 299.46it/s, step size=6.88e-01, acc. prob=0.927]Sample: 98%|█████████▊| 2944/3000 [00:10, 313.23it/s, step size=6.88e-01, acc. prob=0.926]Sample: 99%|█████████▉| 2976/3000 [00:10, 299.49it/s, step size=6.88e-01, acc. prob=0.926]Sample: 100%|██████████| 3000/3000 [00:10, 284.01it/s, step size=6.88e-01, acc. prob=0.926]
CLIP Similarity - Bayesian Regression:
Beta (VerbType effect): -2.342
95% HDI: [-5.286, 0.568]
P(beta < 0): 0.947
# Model for Subject Salience
def subject_model(verb_type, obs=None):
intercept = pyro.sample('intercept', dist.Normal(0., 10.))
beta = pyro.sample('beta', dist.Normal(0., 10.))
sigma = pyro.sample('sigma', dist.HalfNormal(10.))
mu = intercept + beta * verb_type
with pyro.plate('data', len(verb_type)):
pyro.sample('obs', dist.Normal(mu, sigma), obs=obs)
# Run MCMC for Subject Salience
nuts_kernel = NUTS(subject_model)
mcmc_subject = MCMC(nuts_kernel, num_samples=2000, warmup_steps=1000)
mcmc_subject.run(verb_type_tensor, subject_tensor)
subject_samples = mcmc_subject.get_samples()
subject_beta_mean = subject_samples['beta'].mean().item()
subject_beta_hdi = torch.quantile(subject_samples['beta'], torch.tensor([0.025, 0.975]))
print(f"\nSubject Salience - Bayesian Regression:")
print(f" Beta (VerbType effect): {subject_beta_mean:.3f}")
print(f" 95% HDI: [{subject_beta_hdi[0]:.3f}, {subject_beta_hdi[1]:.3f}]")
print(f" P(beta < 0): {(subject_samples['beta'] < 0).float().mean():.3f}")Warmup: 0%| | 0/3000 [00:00, ?it/s]Warmup: 0%| | 10/3000 [00:00, 97.98it/s, step size=2.93e-01, acc. prob=0.754]Warmup: 1%| | 20/3000 [00:00, 61.44it/s, step size=3.29e-01, acc. prob=0.778]Warmup: 1%| | 27/3000 [00:00, 51.42it/s, step size=1.78e-01, acc. prob=0.774]Warmup: 1%| | 33/3000 [00:00, 49.76it/s, step size=3.52e-01, acc. prob=0.786]Warmup: 2%|▏ | 46/3000 [00:00, 67.90it/s, step size=1.82e-01, acc. prob=0.783]Warmup: 2%|▏ | 55/3000 [00:00, 71.15it/s, step size=1.75e-01, acc. prob=0.784]Warmup: 2%|▏ | 64/3000 [00:00, 73.26it/s, step size=1.07e-01, acc. prob=0.782]Warmup: 2%|▏ | 72/3000 [00:01, 74.19it/s, step size=1.98e-01, acc. prob=0.788]Warmup: 3%|▎ | 80/3000 [00:01, 75.04it/s, step size=6.78e-02, acc. prob=0.782]Warmup: 3%|▎ | 88/3000 [00:01, 63.47it/s, step size=2.09e-01, acc. prob=0.789]Warmup: 3%|▎ | 104/3000 [00:01, 87.38it/s, step size=8.98e-01, acc. prob=0.775]Warmup: 4%|▍ | 126/3000 [00:01, 121.76it/s, step size=8.77e-01, acc. prob=0.779]Warmup: 5%|▌ | 157/3000 [00:01, 171.47it/s, step size=6.89e-01, acc. prob=0.773]Warmup: 6%|▌ | 185/3000 [00:01, 201.64it/s, step size=3.14e-01, acc. prob=0.774]Warmup: 7%|▋ | 210/3000 [00:01, 212.53it/s, step size=1.27e+00, acc. prob=0.779]Warmup: 8%|▊ | 240/3000 [00:01, 236.94it/s, step size=8.80e-01, acc. prob=0.780]Warmup: 9%|▉ | 269/3000 [00:02, 247.28it/s, step size=3.78e-01, acc. prob=0.779]Warmup: 10%|▉ | 295/3000 [00:02, 242.85it/s, step size=1.35e+00, acc. prob=0.782]Warmup: 11%|█ | 332/3000 [00:02, 277.31it/s, step size=8.49e-01, acc. prob=0.783]Warmup: 12%|█▏ | 371/3000 [00:02, 307.90it/s, step size=4.78e-01, acc. prob=0.783]Warmup: 13%|█▎ | 403/3000 [00:02, 309.60it/s, step size=4.78e-01, acc. prob=0.784]Warmup: 15%|█▍ | 444/3000 [00:02, 337.84it/s, step size=7.22e-01, acc. prob=0.786]Warmup: 16%|█▌ | 479/3000 [00:02, 300.63it/s, step size=5.74e-01, acc. prob=0.783]Warmup: 17%|█▋ | 510/3000 [00:02, 298.53it/s, step size=8.18e-01, acc. prob=0.784]Warmup: 18%|█▊ | 541/3000 [00:02, 301.15it/s, step size=2.96e-01, acc. prob=0.784]Warmup: 19%|█▉ | 573/3000 [00:03, 306.26it/s, step size=8.61e-01, acc. prob=0.785]Warmup: 20%|██ | 605/3000 [00:03, 309.24it/s, step size=1.09e+00, acc. prob=0.786]Warmup: 21%|██ | 637/3000 [00:03, 244.50it/s, step size=6.73e-01, acc. prob=0.786]Warmup: 22%|██▏ | 664/3000 [00:03, 242.56it/s, step size=1.04e+00, acc. prob=0.787]Warmup: 23%|██▎ | 702/3000 [00:03, 276.02it/s, step size=9.37e-01, acc. prob=0.787]Warmup: 25%|██▍ | 741/3000 [00:03, 304.71it/s, step size=1.03e+00, acc. prob=0.787]Warmup: 26%|██▌ | 779/3000 [00:03, 323.04it/s, step size=7.88e-01, acc. prob=0.788]Warmup: 27%|██▋ | 813/3000 [00:03, 326.08it/s, step size=7.77e-01, acc. prob=0.788]Warmup: 28%|██▊ | 851/3000 [00:03, 341.23it/s, step size=1.12e+00, acc. prob=0.789]Warmup: 30%|██▉ | 891/3000 [00:04, 358.10it/s, step size=1.10e+00, acc. prob=0.789]Warmup: 31%|███ | 928/3000 [00:04, 356.72it/s, step size=1.26e+00, acc. prob=0.790]Warmup: 32%|███▏ | 965/3000 [00:04, 330.21it/s, step size=2.19e-01, acc. prob=0.788]Warmup: 33%|███▎ | 999/3000 [00:04, 304.65it/s, step size=6.97e-01, acc. prob=0.789]Sample: 34%|███▍ | 1031/3000 [00:04, 299.06it/s, step size=6.97e-01, acc. prob=0.892]Sample: 35%|███▌ | 1062/3000 [00:04, 286.66it/s, step size=6.97e-01, acc. prob=0.917]Sample: 36%|███▋ | 1092/3000 [00:04, 279.33it/s, step size=6.97e-01, acc. prob=0.919]Sample: 37%|███▋ | 1121/3000 [00:04, 280.83it/s, step size=6.97e-01, acc. prob=0.918]Sample: 38%|███▊ | 1150/3000 [00:04, 270.16it/s, step size=6.97e-01, acc. prob=0.922]Sample: 39%|███▉ | 1180/3000 [00:05, 275.16it/s, step size=6.97e-01, acc. prob=0.923]Sample: 40%|████ | 1211/3000 [00:05, 281.75it/s, step size=6.97e-01, acc. prob=0.927]Sample: 41%|████▏ | 1244/3000 [00:05, 294.98it/s, step size=6.97e-01, acc. prob=0.926]Sample: 42%|████▎ | 1275/3000 [00:05, 298.97it/s, step size=6.97e-01, acc. prob=0.926]Sample: 44%|████▎ | 1307/3000 [00:05, 302.79it/s, step size=6.97e-01, acc. prob=0.925]Sample: 45%|████▍ | 1338/3000 [00:05, 291.19it/s, step size=6.97e-01, acc. prob=0.925]Sample: 46%|████▌ | 1368/3000 [00:05, 284.90it/s, step size=6.97e-01, acc. prob=0.926]Sample: 47%|████▋ | 1397/3000 [00:05, 276.51it/s, step size=6.97e-01, acc. prob=0.925]Sample: 48%|████▊ | 1427/3000 [00:05, 282.13it/s, step size=6.97e-01, acc. prob=0.926]Sample: 49%|████▊ | 1458/3000 [00:06, 288.03it/s, step size=6.97e-01, acc. prob=0.928]Sample: 50%|████▉ | 1489/3000 [00:06, 292.06it/s, step size=6.97e-01, acc. prob=0.929]Sample: 51%|█████ | 1519/3000 [00:06, 287.81it/s, step size=6.97e-01, acc. prob=0.928]Sample: 52%|█████▏ | 1554/3000 [00:06, 304.96it/s, step size=6.97e-01, acc. prob=0.927]Sample: 53%|█████▎ | 1585/3000 [00:06, 297.52it/s, step size=6.97e-01, acc. prob=0.929]Sample: 54%|█████▍ | 1615/3000 [00:06, 290.53it/s, step size=6.97e-01, acc. prob=0.929]Sample: 55%|█████▍ | 1647/3000 [00:06, 297.47it/s, step size=6.97e-01, acc. prob=0.929]Sample: 56%|█████▌ | 1677/3000 [00:06, 296.94it/s, step size=6.97e-01, acc. prob=0.929]Sample: 57%|█████▋ | 1707/3000 [00:06, 290.96it/s, step size=6.97e-01, acc. prob=0.927]Sample: 58%|█████▊ | 1739/3000 [00:06, 298.00it/s, step size=6.97e-01, acc. prob=0.927]Sample: 59%|█████▉ | 1769/3000 [00:07, 296.46it/s, step size=6.97e-01, acc. prob=0.928]Sample: 60%|██████ | 1802/3000 [00:07, 304.48it/s, step size=6.97e-01, acc. prob=0.928]Sample: 61%|██████ | 1833/3000 [00:07, 305.10it/s, step size=6.97e-01, acc. prob=0.928]Sample: 62%|██████▏ | 1864/3000 [00:07, 296.60it/s, step size=6.97e-01, acc. prob=0.929]Sample: 63%|██████▎ | 1895/3000 [00:07, 297.64it/s, step size=6.97e-01, acc. prob=0.929]Sample: 64%|██████▍ | 1927/3000 [00:07, 302.27it/s, step size=6.97e-01, acc. prob=0.929]Sample: 65%|██████▌ | 1958/3000 [00:07, 297.85it/s, step size=6.97e-01, acc. prob=0.930]Sample: 66%|██████▋ | 1989/3000 [00:07, 300.08it/s, step size=6.97e-01, acc. prob=0.930]Sample: 67%|██████▋ | 2024/3000 [00:07, 313.14it/s, step size=6.97e-01, acc. prob=0.931]Sample: 69%|██████▊ | 2056/3000 [00:08, 311.26it/s, step size=6.97e-01, acc. prob=0.931]Sample: 70%|██████▉ | 2088/3000 [00:08, 305.52it/s, step size=6.97e-01, acc. prob=0.930]Sample: 71%|███████ | 2123/3000 [00:08, 316.27it/s, step size=6.97e-01, acc. prob=0.931]Sample: 72%|███████▏ | 2155/3000 [00:08, 307.09it/s, step size=6.97e-01, acc. prob=0.930]Sample: 73%|███████▎ | 2187/3000 [00:08, 309.50it/s, step size=6.97e-01, acc. prob=0.931]Sample: 74%|███████▍ | 2220/3000 [00:08, 314.90it/s, step size=6.97e-01, acc. prob=0.930]Sample: 75%|███████▌ | 2252/3000 [00:08, 315.91it/s, step size=6.97e-01, acc. prob=0.930]Sample: 76%|███████▌ | 2284/3000 [00:08, 314.89it/s, step size=6.97e-01, acc. prob=0.930]Sample: 77%|███████▋ | 2317/3000 [00:08, 316.23it/s, step size=6.97e-01, acc. prob=0.930]Sample: 78%|███████▊ | 2354/3000 [00:08, 329.80it/s, step size=6.97e-01, acc. prob=0.930]Sample: 80%|███████▉ | 2387/3000 [00:09, 325.75it/s, step size=6.97e-01, acc. prob=0.930]Sample: 81%|████████ | 2420/3000 [00:09, 315.67it/s, step size=6.97e-01, acc. prob=0.930]Sample: 82%|████████▏ | 2452/3000 [00:09, 300.50it/s, step size=6.97e-01, acc. prob=0.930]Sample: 83%|████████▎ | 2485/3000 [00:09, 305.95it/s, step size=6.97e-01, acc. prob=0.930]Sample: 84%|████████▍ | 2517/3000 [00:09, 308.38it/s, step size=6.97e-01, acc. prob=0.929]Sample: 85%|████████▍ | 2548/3000 [00:09, 299.19it/s, step size=6.97e-01, acc. prob=0.929]Sample: 86%|████████▌ | 2580/3000 [00:09, 302.66it/s, step size=6.97e-01, acc. prob=0.929]Sample: 87%|████████▋ | 2611/3000 [00:09, 291.23it/s, step size=6.97e-01, acc. prob=0.929]Sample: 88%|████████▊ | 2641/3000 [00:09, 287.51it/s, step size=6.97e-01, acc. prob=0.929]Sample: 89%|████████▉ | 2672/3000 [00:10, 292.79it/s, step size=6.97e-01, acc. prob=0.930]Sample: 90%|█████████ | 2703/3000 [00:10, 295.92it/s, step size=6.97e-01, acc. prob=0.930]Sample: 91%|█████████ | 2733/3000 [00:10, 292.34it/s, step size=6.97e-01, acc. prob=0.930]Sample: 92%|█████████▏| 2763/3000 [00:10, 294.43it/s, step size=6.97e-01, acc. prob=0.930]Sample: 93%|█████████▎| 2793/3000 [00:10, 293.85it/s, step size=6.97e-01, acc. prob=0.930]Sample: 94%|█████████▍| 2825/3000 [00:10, 298.19it/s, step size=6.97e-01, acc. prob=0.931]Sample: 95%|█████████▌| 2855/3000 [00:10, 291.81it/s, step size=6.97e-01, acc. prob=0.931]Sample: 96%|█████████▌| 2885/3000 [00:10, 287.90it/s, step size=6.97e-01, acc. prob=0.930]Sample: 97%|█████████▋| 2914/3000 [00:10, 284.14it/s, step size=6.97e-01, acc. prob=0.930]Sample: 98%|█████████▊| 2945/3000 [00:10, 290.11it/s, step size=6.97e-01, acc. prob=0.930]Sample: 99%|█████████▉| 2975/3000 [00:11, 291.65it/s, step size=6.97e-01, acc. prob=0.930]Sample: 100%|██████████| 3000/3000 [00:11, 268.86it/s, step size=6.97e-01, acc. prob=0.930]
Subject Salience - Bayesian Regression:
Beta (VerbType effect): -1.480
95% HDI: [-4.629, 1.625]
P(beta < 0): 0.818
Model for VLM Score (Ordered Logistic Regression)
# We treat VLM scores as ordinal data
vlm_score_tensor = torch.tensor(df_pyro['VLM_Score'].values, dtype=torch.long)
k_categories = vlm_score_tensor.max().item() + 1
k_cutpoints = k_categories - 1
def vlm_model(verb_type, obs=None):
alpha = pyro.sample('alpha', dist.Normal(0., 10.))
beta = pyro.sample('beta', dist.Normal(0., 10.))
with pyro.plate("cutpoints_plate", k_cutpoints):
raw_cutpoints = pyro.sample('raw_cutpoints', dist.Normal(torch.arange(k_cutpoints).float(), 1.))
cutpoints = torch.sort(raw_cutpoints)[0]
latent_propensity = alpha + beta * verb_type
with pyro.plate('data', len(verb_type)):
pyro.sample('obs', dist.OrderedLogistic(latent_propensity, cutpoints), obs=obs)
# Run MCMC for VLM Score
nuts_kernel_vlm = NUTS(vlm_model)
mcmc_vlm = MCMC(nuts_kernel_vlm, num_samples=2000, warmup_steps=1000, num_chains=1)
mcmc_vlm.run(verb_type_tensor, vlm_score_tensor)
vlm_samples = mcmc_vlm.get_samples()
vlm_beta_mean = vlm_samples['beta'].mean().item()
vlm_beta_hdi = torch.quantile(vlm_samples['beta'], torch.tensor([0.025, 0.975]))
print(f"\nVLM Score - Ordered Logistic Regression:")
print(f" Beta (VerbType effect): {vlm_beta_mean:.3f}")
print(f" 95% HDI: [{vlm_beta_hdi[0]:.3f}, {vlm_beta_hdi[1]:.3f}]")
print(f" P(beta < 0): {(vlm_samples['beta'] < 0).float().mean():.3f}")Warmup: 0%| | 0/3000 [00:00, ?it/s]Warmup: 0%| | 7/3000 [00:00, 69.48it/s, step size=8.88e-02, acc. prob=0.679]Warmup: 0%| | 14/3000 [00:00, 35.40it/s, step size=1.78e-01, acc. prob=0.755]Warmup: 1%| | 20/3000 [00:00, 42.13it/s, step size=6.40e-02, acc. prob=0.750]Warmup: 1%| | 26/3000 [00:00, 41.49it/s, step size=1.10e-01, acc. prob=0.767]Warmup: 1%| | 31/3000 [00:00, 39.33it/s, step size=4.80e-02, acc. prob=0.761]Warmup: 1%| | 36/3000 [00:01, 20.47it/s, step size=4.42e-02, acc. prob=0.764]Warmup: 1%|▏ | 40/3000 [00:01, 21.44it/s, step size=7.56e-02, acc. prob=0.772]Warmup: 1%|▏ | 43/3000 [00:01, 21.78it/s, step size=5.47e-02, acc. prob=0.771]Warmup: 2%|▏ | 47/3000 [00:01, 22.85it/s, step size=5.00e-02, acc. prob=0.772]Warmup: 2%|▏ | 50/3000 [00:01, 21.81it/s, step size=1.07e-01, acc. prob=0.779]Warmup: 2%|▏ | 53/3000 [00:02, 19.80it/s, step size=4.37e-02, acc. prob=0.773]Warmup: 2%|▏ | 56/3000 [00:02, 21.34it/s, step size=5.05e-02, acc. prob=0.775]Warmup: 2%|▏ | 59/3000 [00:02, 17.98it/s, step size=8.99e-02, acc. prob=0.780]Warmup: 2%|▏ | 64/3000 [00:02, 22.97it/s, step size=9.59e-02, acc. prob=0.781]Warmup: 2%|▏ | 69/3000 [00:02, 28.38it/s, step size=2.02e-01, acc. prob=0.787]Warmup: 2%|▏ | 73/3000 [00:02, 27.09it/s, step size=6.03e-02, acc. prob=0.780]Warmup: 3%|▎ | 77/3000 [00:02, 28.36it/s, step size=9.36e-02, acc. prob=0.783]Warmup: 3%|▎ | 82/3000 [00:03, 29.92it/s, step size=5.06e-02, acc. prob=0.780]Warmup: 3%|▎ | 86/3000 [00:03, 30.22it/s, step size=3.88e-02, acc. prob=0.779]Warmup: 3%|▎ | 90/3000 [00:03, 27.32it/s, step size=7.42e-02, acc. prob=0.783]Warmup: 3%|▎ | 93/3000 [00:03, 27.58it/s, step size=5.97e-02, acc. prob=0.783]Warmup: 3%|▎ | 97/3000 [00:03, 29.72it/s, step size=8.98e-02, acc. prob=0.785]Warmup: 3%|▎ | 101/3000 [00:03, 32.00it/s, step size=2.02e-01, acc. prob=0.767]Warmup: 4%|▎ | 105/3000 [00:03, 27.63it/s, step size=1.68e-01, acc. prob=0.771]Warmup: 4%|▎ | 108/3000 [00:04, 25.10it/s, step size=1.43e-01, acc. prob=0.772]Warmup: 4%|▎ | 111/3000 [00:04, 22.00it/s, step size=2.83e-02, acc. prob=0.768]Warmup: 4%|▍ | 114/3000 [00:04, 18.18it/s, step size=7.19e-02, acc. prob=0.771]Warmup: 4%|▍ | 117/3000 [00:04, 14.02it/s, step size=6.28e-02, acc. prob=0.771]Warmup: 4%|▍ | 119/3000 [00:04, 13.47it/s, step size=8.84e-02, acc. prob=0.773]Warmup: 4%|▍ | 123/3000 [00:05, 17.06it/s, step size=5.37e-02, acc. prob=0.772]Warmup: 4%|▍ | 125/3000 [00:05, 17.20it/s, step size=1.38e-01, acc. prob=0.775]Warmup: 4%|▍ | 131/3000 [00:05, 24.72it/s, step size=1.76e-01, acc. prob=0.776]Warmup: 4%|▍ | 135/3000 [00:05, 26.82it/s, step size=1.27e-01, acc. prob=0.775]Warmup: 5%|▍ | 139/3000 [00:05, 28.31it/s, step size=2.15e-01, acc. prob=0.777]Warmup: 5%|▍ | 145/3000 [00:05, 35.60it/s, step size=1.24e-01, acc. prob=0.776]Warmup: 5%|▌ | 152/3000 [00:05, 41.99it/s, step size=1.20e-01, acc. prob=0.764]Warmup: 5%|▌ | 157/3000 [00:05, 42.20it/s, step size=1.11e-01, acc. prob=0.767]Warmup: 5%|▌ | 163/3000 [00:06, 46.53it/s, step size=2.91e-01, acc. prob=0.770]Warmup: 6%|▌ | 168/3000 [00:06, 46.78it/s, step size=2.45e-01, acc. prob=0.770]Warmup: 6%|▌ | 175/3000 [00:06, 52.63it/s, step size=4.31e-01, acc. prob=0.772]Warmup: 6%|▌ | 181/3000 [00:06, 52.07it/s, step size=1.35e-01, acc. prob=0.770]Warmup: 6%|▌ | 187/3000 [00:06, 41.06it/s, step size=1.48e-01, acc. prob=0.771]Warmup: 6%|▋ | 193/3000 [00:06, 44.71it/s, step size=2.79e-01, acc. prob=0.773]Warmup: 7%|▋ | 199/3000 [00:06, 47.07it/s, step size=1.12e-01, acc. prob=0.771]Warmup: 7%|▋ | 205/3000 [00:06, 50.06it/s, step size=1.74e-01, acc. prob=0.773]Warmup: 7%|▋ | 211/3000 [00:07, 52.51it/s, step size=2.10e-01, acc. prob=0.774]Warmup: 7%|▋ | 221/3000 [00:07, 62.62it/s, step size=9.85e-02, acc. prob=0.773]Warmup: 8%|▊ | 228/3000 [00:07, 58.94it/s, step size=2.36e-01, acc. prob=0.775]Warmup: 8%|▊ | 235/3000 [00:07, 52.43it/s, step size=1.73e-01, acc. prob=0.775]Warmup: 8%|▊ | 241/3000 [00:07, 51.29it/s, step size=1.67e-01, acc. prob=0.775]Warmup: 8%|▊ | 250/3000 [00:07, 58.79it/s, step size=1.02e+00, acc. prob=0.778]Warmup: 9%|▊ | 257/3000 [00:07, 61.21it/s, step size=1.73e-01, acc. prob=0.776]Warmup: 9%|▉ | 265/3000 [00:07, 63.17it/s, step size=1.62e-01, acc. prob=0.777]Warmup: 9%|▉ | 272/3000 [00:07, 64.49it/s, step size=2.55e-01, acc. prob=0.778]Warmup: 9%|▉ | 279/3000 [00:08, 53.47it/s, step size=2.24e-01, acc. prob=0.778]Warmup: 10%|▉ | 287/3000 [00:08, 58.89it/s, step size=1.60e-01, acc. prob=0.778]Warmup: 10%|▉ | 294/3000 [00:08, 53.50it/s, step size=7.31e-02, acc. prob=0.777]Warmup: 10%|█ | 300/3000 [00:08, 53.29it/s, step size=2.18e-01, acc. prob=0.779]Warmup: 10%|█ | 306/3000 [00:08, 46.41it/s, step size=2.85e-01, acc. prob=0.780]Warmup: 10%|█ | 314/3000 [00:08, 51.59it/s, step size=1.35e-01, acc. prob=0.779]Warmup: 11%|█ | 320/3000 [00:09, 45.77it/s, step size=1.08e-01, acc. prob=0.779]Warmup: 11%|█ | 326/3000 [00:09, 48.67it/s, step size=2.04e-01, acc. prob=0.780]Warmup: 11%|█ | 332/3000 [00:09, 49.31it/s, step size=1.01e-01, acc. prob=0.780]Warmup: 11%|█▏ | 338/3000 [00:09, 49.07it/s, step size=8.94e-02, acc. prob=0.780]Warmup: 11%|█▏ | 344/3000 [00:09, 48.82it/s, step size=1.28e-01, acc. prob=0.781]Warmup: 12%|█▏ | 349/3000 [00:09, 48.94it/s, step size=1.61e-01, acc. prob=0.781]Warmup: 12%|█▏ | 354/3000 [00:09, 47.66it/s, step size=1.70e-01, acc. prob=0.781]Warmup: 12%|█▏ | 363/3000 [00:09, 55.93it/s, step size=1.46e-01, acc. prob=0.782]Warmup: 12%|█▏ | 369/3000 [00:09, 53.95it/s, step size=7.80e-02, acc. prob=0.781]Warmup: 12%|█▎ | 375/3000 [00:10, 52.18it/s, step size=1.11e-01, acc. prob=0.782]Warmup: 13%|█▎ | 382/3000 [00:10, 54.19it/s, step size=1.63e-01, acc. prob=0.782]Warmup: 13%|█▎ | 388/3000 [00:10, 50.32it/s, step size=1.15e-01, acc. prob=0.782]Warmup: 13%|█▎ | 396/3000 [00:10, 55.36it/s, step size=1.47e-01, acc. prob=0.783]Warmup: 13%|█▎ | 403/3000 [00:10, 58.41it/s, step size=1.11e-01, acc. prob=0.783]Warmup: 14%|█▎ | 409/3000 [00:10, 46.48it/s, step size=8.72e-02, acc. prob=0.782]Warmup: 14%|█▍ | 415/3000 [00:11, 31.12it/s, step size=1.38e-01, acc. prob=0.783]Warmup: 14%|█▍ | 423/3000 [00:11, 37.87it/s, step size=9.30e-02, acc. prob=0.783]Warmup: 14%|█▍ | 430/3000 [00:11, 43.19it/s, step size=1.79e-01, acc. prob=0.784]Warmup: 15%|█▍ | 437/3000 [00:11, 47.32it/s, step size=1.29e-01, acc. prob=0.784]Warmup: 15%|█▍ | 443/3000 [00:11, 49.15it/s, step size=1.46e-01, acc. prob=0.784]Warmup: 15%|█▍ | 449/3000 [00:11, 47.32it/s, step size=6.78e-01, acc. prob=0.785]Warmup: 15%|█▌ | 458/3000 [00:11, 53.32it/s, step size=6.48e-02, acc. prob=0.782]Warmup: 15%|█▌ | 464/3000 [00:11, 53.32it/s, step size=7.68e-02, acc. prob=0.782]Warmup: 16%|█▌ | 472/3000 [00:12, 58.44it/s, step size=1.37e-01, acc. prob=0.783]Warmup: 16%|█▌ | 479/3000 [00:12, 60.18it/s, step size=9.63e-02, acc. prob=0.782]Warmup: 16%|█▌ | 486/3000 [00:12, 46.01it/s, step size=4.11e-02, acc. prob=0.782]Warmup: 16%|█▋ | 492/3000 [00:12, 41.10it/s, step size=3.15e-01, acc. prob=0.783]Warmup: 17%|█▋ | 498/3000 [00:12, 43.69it/s, step size=1.19e-01, acc. prob=0.783]Warmup: 17%|█▋ | 506/3000 [00:12, 50.24it/s, step size=1.44e-01, acc. prob=0.783]Warmup: 17%|█▋ | 512/3000 [00:12, 48.86it/s, step size=2.40e-01, acc. prob=0.784]Warmup: 17%|█▋ | 518/3000 [00:13, 42.81it/s, step size=9.12e-02, acc. prob=0.783]Warmup: 17%|█▋ | 523/3000 [00:13, 42.86it/s, step size=5.45e-02, acc. prob=0.782]Warmup: 18%|█▊ | 528/3000 [00:13, 34.03it/s, step size=8.87e-02, acc. prob=0.783]Warmup: 18%|█▊ | 533/3000 [00:13, 37.16it/s, step size=1.09e-01, acc. prob=0.783]Warmup: 18%|█▊ | 539/3000 [00:13, 42.15it/s, step size=2.56e-01, acc. prob=0.784]Warmup: 18%|█▊ | 546/3000 [00:13, 47.45it/s, step size=1.78e-01, acc. prob=0.784]Warmup: 19%|█▊ | 556/3000 [00:13, 55.04it/s, step size=1.32e-01, acc. prob=0.784]Warmup: 19%|█▊ | 562/3000 [00:14, 52.01it/s, step size=6.64e-02, acc. prob=0.783]Warmup: 19%|█▉ | 568/3000 [00:14, 45.38it/s, step size=1.18e-01, acc. prob=0.784]Warmup: 19%|█▉ | 573/3000 [00:14, 46.41it/s, step size=1.33e-01, acc. prob=0.784]Warmup: 19%|█▉ | 578/3000 [00:14, 45.05it/s, step size=5.80e-02, acc. prob=0.783]Warmup: 19%|█▉ | 583/3000 [00:14, 34.78it/s, step size=1.09e-01, acc. prob=0.784]Warmup: 20%|█▉ | 588/3000 [00:14, 37.99it/s, step size=6.40e-02, acc. prob=0.783]Warmup: 20%|█▉ | 593/3000 [00:14, 36.38it/s, step size=1.15e-01, acc. prob=0.784]Warmup: 20%|█▉ | 598/3000 [00:15, 39.13it/s, step size=1.40e-01, acc. prob=0.784]Warmup: 20%|██ | 606/3000 [00:15, 48.51it/s, step size=1.65e-01, acc. prob=0.785]Warmup: 21%|██ | 616/3000 [00:15, 61.33it/s, step size=3.09e-01, acc. prob=0.785]Warmup: 21%|██ | 624/3000 [00:15, 64.14it/s, step size=7.38e-02, acc. prob=0.784]Warmup: 21%|██ | 631/3000 [00:15, 59.27it/s, step size=1.53e-01, acc. prob=0.785]Warmup: 21%|██▏ | 638/3000 [00:15, 53.33it/s, step size=1.08e-01, acc. prob=0.784]Warmup: 22%|██▏ | 645/3000 [00:15, 56.29it/s, step size=1.27e-01, acc. prob=0.785]Warmup: 22%|██▏ | 651/3000 [00:15, 56.17it/s, step size=1.78e-01, acc. prob=0.785]Warmup: 22%|██▏ | 659/3000 [00:16, 62.22it/s, step size=1.27e-01, acc. prob=0.785]Warmup: 22%|██▏ | 666/3000 [00:16, 61.26it/s, step size=1.19e-01, acc. prob=0.785]Warmup: 22%|██▏ | 673/3000 [00:16, 53.20it/s, step size=1.11e-01, acc. prob=0.785]Warmup: 23%|██▎ | 679/3000 [00:16, 49.72it/s, step size=1.33e-01, acc. prob=0.785]Warmup: 23%|██▎ | 685/3000 [00:16, 44.29it/s, step size=8.11e-02, acc. prob=0.785]Warmup: 23%|██▎ | 690/3000 [00:16, 36.51it/s, step size=8.62e-02, acc. prob=0.785]Warmup: 23%|██▎ | 695/3000 [00:16, 38.03it/s, step size=9.44e-02, acc. prob=0.785]Warmup: 23%|██▎ | 700/3000 [00:17, 37.67it/s, step size=9.28e-02, acc. prob=0.785]Warmup: 24%|██▎ | 705/3000 [00:17, 40.09it/s, step size=4.14e-02, acc. prob=0.784]Warmup: 24%|██▎ | 710/3000 [00:17, 27.95it/s, step size=6.27e-02, acc. prob=0.785]Warmup: 24%|██▍ | 714/3000 [00:17, 29.66it/s, step size=1.14e-01, acc. prob=0.786]Warmup: 24%|██▍ | 719/3000 [00:17, 33.03it/s, step size=1.06e-01, acc. prob=0.785]Warmup: 24%|██▍ | 727/3000 [00:17, 43.40it/s, step size=2.52e-01, acc. prob=0.787]Warmup: 24%|██▍ | 734/3000 [00:17, 49.77it/s, step size=1.27e-01, acc. prob=0.786]Warmup: 25%|██▍ | 740/3000 [00:18, 46.36it/s, step size=8.34e-02, acc. prob=0.785]Warmup: 25%|██▍ | 746/3000 [00:18, 44.24it/s, step size=1.32e-01, acc. prob=0.786]Warmup: 25%|██▌ | 752/3000 [00:18, 47.87it/s, step size=1.52e-01, acc. prob=0.786]Warmup: 25%|██▌ | 758/3000 [00:18, 50.81it/s, step size=1.86e-01, acc. prob=0.787]Warmup: 26%|██▌ | 766/3000 [00:18, 57.72it/s, step size=1.29e-01, acc. prob=0.786]Warmup: 26%|██▌ | 773/3000 [00:18, 54.65it/s, step size=1.57e-01, acc. prob=0.787]Warmup: 26%|██▌ | 781/3000 [00:18, 59.58it/s, step size=2.47e-01, acc. prob=0.787]Warmup: 26%|██▋ | 789/3000 [00:18, 60.34it/s, step size=1.38e-01, acc. prob=0.787]Warmup: 27%|██▋ | 799/3000 [00:19, 69.20it/s, step size=2.07e-01, acc. prob=0.787]Warmup: 27%|██▋ | 809/3000 [00:19, 75.82it/s, step size=1.66e-01, acc. prob=0.787]Warmup: 27%|██▋ | 819/3000 [00:19, 80.91it/s, step size=1.54e-01, acc. prob=0.787]Warmup: 28%|██▊ | 828/3000 [00:19, 74.71it/s, step size=1.83e-01, acc. prob=0.787]Warmup: 28%|██▊ | 836/3000 [00:19, 68.89it/s, step size=1.49e-01, acc. prob=0.787]Warmup: 28%|██▊ | 844/3000 [00:19, 63.45it/s, step size=1.47e-01, acc. prob=0.787]Warmup: 28%|██▊ | 851/3000 [00:19, 62.21it/s, step size=9.79e-02, acc. prob=0.787]Warmup: 29%|██▊ | 858/3000 [00:19, 62.21it/s, step size=9.18e-02, acc. prob=0.787]Warmup: 29%|██▉ | 865/3000 [00:20, 56.73it/s, step size=1.27e-01, acc. prob=0.787]Warmup: 29%|██▉ | 872/3000 [00:20, 57.27it/s, step size=1.26e-01, acc. prob=0.787]Warmup: 29%|██▉ | 881/3000 [00:20, 65.31it/s, step size=2.06e-01, acc. prob=0.788]Warmup: 30%|██▉ | 889/3000 [00:20, 69.11it/s, step size=1.86e-01, acc. prob=0.788]Warmup: 30%|██▉ | 897/3000 [00:20, 63.93it/s, step size=1.22e-01, acc. prob=0.787]Warmup: 30%|███ | 904/3000 [00:20, 50.94it/s, step size=1.01e-01, acc. prob=0.787]Warmup: 30%|███ | 910/3000 [00:20, 44.85it/s, step size=1.48e-01, acc. prob=0.788]Warmup: 30%|███ | 915/3000 [00:21, 45.88it/s, step size=1.31e-01, acc. prob=0.788]Warmup: 31%|███ | 921/3000 [00:21, 46.37it/s, step size=1.50e-01, acc. prob=0.788]Warmup: 31%|███ | 926/3000 [00:21, 44.80it/s, step size=9.33e-02, acc. prob=0.787]Warmup: 31%|███ | 931/3000 [00:21, 45.08it/s, step size=1.17e-01, acc. prob=0.788]Warmup: 31%|███ | 936/3000 [00:21, 43.37it/s, step size=1.38e-01, acc. prob=0.788]Warmup: 31%|███▏ | 941/3000 [00:21, 43.00it/s, step size=1.38e-01, acc. prob=0.788]Warmup: 32%|███▏ | 946/3000 [00:21, 44.58it/s, step size=1.72e-01, acc. prob=0.788]Warmup: 32%|███▏ | 956/3000 [00:21, 58.57it/s, step size=1.06e-01, acc. prob=0.787]Warmup: 32%|███▏ | 963/3000 [00:21, 51.88it/s, step size=3.14e-01, acc. prob=0.787]Warmup: 32%|███▏ | 969/3000 [00:22, 43.77it/s, step size=1.43e-01, acc. prob=0.787]Warmup: 32%|███▏ | 974/3000 [00:22, 42.98it/s, step size=1.99e-01, acc. prob=0.787]Warmup: 33%|███▎ | 979/3000 [00:22, 43.58it/s, step size=1.43e-01, acc. prob=0.787]Warmup: 33%|███▎ | 984/3000 [00:22, 42.52it/s, step size=1.19e-01, acc. prob=0.787]Warmup: 33%|███▎ | 990/3000 [00:22, 46.52it/s, step size=8.07e-02, acc. prob=0.787]Warmup: 33%|███▎ | 995/3000 [00:22, 42.47it/s, step size=1.66e-01, acc. prob=0.787]Warmup: 33%|███▎ | 1001/3000 [00:22, 45.12it/s, step size=1.33e-01, acc. prob=0.638]Sample: 34%|███▎ | 1006/3000 [00:23, 45.28it/s, step size=1.33e-01, acc. prob=0.870]Sample: 34%|███▎ | 1012/3000 [00:23, 47.85it/s, step size=1.33e-01, acc. prob=0.868]Sample: 34%|███▍ | 1019/3000 [00:23, 51.09it/s, step size=1.33e-01, acc. prob=0.846]Sample: 34%|███▍ | 1025/3000 [00:23, 52.27it/s, step size=1.33e-01, acc. prob=0.849]Sample: 34%|███▍ | 1031/3000 [00:23, 52.99it/s, step size=1.33e-01, acc. prob=0.856]Sample: 35%|███▍ | 1037/3000 [00:23, 52.05it/s, step size=1.33e-01, acc. prob=0.836]Sample: 35%|███▍ | 1044/3000 [00:23, 55.66it/s, step size=1.33e-01, acc. prob=0.839]Sample: 35%|███▌ | 1050/3000 [00:23, 50.82it/s, step size=1.33e-01, acc. prob=0.841]Sample: 35%|███▌ | 1056/3000 [00:23, 49.00it/s, step size=1.33e-01, acc. prob=0.834]Sample: 35%|███▌ | 1062/3000 [00:24, 50.73it/s, step size=1.33e-01, acc. prob=0.842]Sample: 36%|███▌ | 1068/3000 [00:24, 48.54it/s, step size=1.33e-01, acc. prob=0.833]Sample: 36%|███▌ | 1073/3000 [00:24, 46.94it/s, step size=1.33e-01, acc. prob=0.839]Sample: 36%|███▌ | 1078/3000 [00:24, 44.35it/s, step size=1.33e-01, acc. prob=0.831]Sample: 36%|███▌ | 1085/3000 [00:24, 48.63it/s, step size=1.33e-01, acc. prob=0.820]Sample: 36%|███▋ | 1091/3000 [00:24, 49.57it/s, step size=1.33e-01, acc. prob=0.824]Sample: 37%|███▋ | 1096/3000 [00:24, 49.12it/s, step size=1.33e-01, acc. prob=0.824]Sample: 37%|███▋ | 1101/3000 [00:24, 41.34it/s, step size=1.33e-01, acc. prob=0.828]Sample: 37%|███▋ | 1107/3000 [00:25, 44.46it/s, step size=1.33e-01, acc. prob=0.833]Sample: 37%|███▋ | 1112/3000 [00:25, 42.04it/s, step size=1.33e-01, acc. prob=0.828]Sample: 37%|███▋ | 1117/3000 [00:25, 43.79it/s, step size=1.33e-01, acc. prob=0.829]Sample: 37%|███▋ | 1122/3000 [00:25, 45.32it/s, step size=1.33e-01, acc. prob=0.824]Sample: 38%|███▊ | 1127/3000 [00:25, 45.44it/s, step size=1.33e-01, acc. prob=0.826]Sample: 38%|███▊ | 1133/3000 [00:25, 48.17it/s, step size=1.33e-01, acc. prob=0.820]Sample: 38%|███▊ | 1138/3000 [00:25, 42.21it/s, step size=1.33e-01, acc. prob=0.818]Sample: 38%|███▊ | 1143/3000 [00:25, 41.89it/s, step size=1.33e-01, acc. prob=0.819]Sample: 38%|███▊ | 1150/3000 [00:26, 48.68it/s, step size=1.33e-01, acc. prob=0.818]Sample: 39%|███▊ | 1156/3000 [00:26, 49.09it/s, step size=1.33e-01, acc. prob=0.819]Sample: 39%|███▊ | 1162/3000 [00:26, 47.90it/s, step size=1.33e-01, acc. prob=0.816]Sample: 39%|███▉ | 1168/3000 [00:26, 48.68it/s, step size=1.33e-01, acc. prob=0.815]Sample: 39%|███▉ | 1174/3000 [00:26, 50.51it/s, step size=1.33e-01, acc. prob=0.817]Sample: 39%|███▉ | 1180/3000 [00:26, 45.31it/s, step size=1.33e-01, acc. prob=0.816]Sample: 40%|███▉ | 1185/3000 [00:26, 43.27it/s, step size=1.33e-01, acc. prob=0.817]Sample: 40%|███▉ | 1190/3000 [00:26, 43.87it/s, step size=1.33e-01, acc. prob=0.817]Sample: 40%|███▉ | 1195/3000 [00:27, 44.33it/s, step size=1.33e-01, acc. prob=0.817]Sample: 40%|████ | 1201/3000 [00:27, 46.14it/s, step size=1.33e-01, acc. prob=0.817]Sample: 40%|████ | 1207/3000 [00:27, 48.78it/s, step size=1.33e-01, acc. prob=0.820]Sample: 40%|████ | 1212/3000 [00:27, 44.15it/s, step size=1.33e-01, acc. prob=0.819]Sample: 41%|████ | 1217/3000 [00:27, 44.64it/s, step size=1.33e-01, acc. prob=0.821]Sample: 41%|████ | 1222/3000 [00:27, 45.14it/s, step size=1.33e-01, acc. prob=0.821]Sample: 41%|████ | 1228/3000 [00:27, 48.14it/s, step size=1.33e-01, acc. prob=0.821]Sample: 41%|████ | 1234/3000 [00:27, 50.14it/s, step size=1.33e-01, acc. prob=0.819]Sample: 41%|████▏ | 1242/3000 [00:27, 52.09it/s, step size=1.33e-01, acc. prob=0.822]Sample: 42%|████▏ | 1248/3000 [00:28, 44.84it/s, step size=1.33e-01, acc. prob=0.820]Sample: 42%|████▏ | 1253/3000 [00:28, 42.89it/s, step size=1.33e-01, acc. prob=0.817]Sample: 42%|████▏ | 1258/3000 [00:28, 42.79it/s, step size=1.33e-01, acc. prob=0.816]Sample: 42%|████▏ | 1263/3000 [00:28, 41.97it/s, step size=1.33e-01, acc. prob=0.816]Sample: 42%|████▏ | 1268/3000 [00:28, 42.21it/s, step size=1.33e-01, acc. prob=0.817]Sample: 42%|████▏ | 1273/3000 [00:28, 39.47it/s, step size=1.33e-01, acc. prob=0.816]Sample: 43%|████▎ | 1279/3000 [00:28, 44.37it/s, step size=1.33e-01, acc. prob=0.815]Sample: 43%|████▎ | 1284/3000 [00:29, 41.87it/s, step size=1.33e-01, acc. prob=0.815]Sample: 43%|████▎ | 1290/3000 [00:29, 46.04it/s, step size=1.33e-01, acc. prob=0.817]Sample: 43%|████▎ | 1296/3000 [00:29, 48.14it/s, step size=1.33e-01, acc. prob=0.818]Sample: 43%|████▎ | 1302/3000 [00:29, 49.66it/s, step size=1.33e-01, acc. prob=0.817]Sample: 44%|████▎ | 1308/3000 [00:29, 49.59it/s, step size=1.33e-01, acc. prob=0.816]Sample: 44%|████▍ | 1314/3000 [00:29, 43.73it/s, step size=1.33e-01, acc. prob=0.815]Sample: 44%|████▍ | 1321/3000 [00:29, 49.31it/s, step size=1.33e-01, acc. prob=0.813]Sample: 44%|████▍ | 1329/3000 [00:29, 56.22it/s, step size=1.33e-01, acc. prob=0.814]Sample: 44%|████▍ | 1335/3000 [00:29, 52.88it/s, step size=1.33e-01, acc. prob=0.815]Sample: 45%|████▍ | 1341/3000 [00:30, 53.24it/s, step size=1.33e-01, acc. prob=0.816]Sample: 45%|████▍ | 1348/3000 [00:30, 56.93it/s, step size=1.33e-01, acc. prob=0.814]Sample: 45%|████▌ | 1354/3000 [00:30, 55.72it/s, step size=1.33e-01, acc. prob=0.814]Sample: 45%|████▌ | 1360/3000 [00:30, 53.21it/s, step size=1.33e-01, acc. prob=0.812]Sample: 46%|████▌ | 1366/3000 [00:30, 48.73it/s, step size=1.33e-01, acc. prob=0.811]Sample: 46%|████▌ | 1371/3000 [00:30, 45.94it/s, step size=1.33e-01, acc. prob=0.812]Sample: 46%|████▌ | 1377/3000 [00:30, 48.99it/s, step size=1.33e-01, acc. prob=0.812]Sample: 46%|████▌ | 1383/3000 [00:30, 51.75it/s, step size=1.33e-01, acc. prob=0.812]Sample: 46%|████▋ | 1389/3000 [00:31, 44.56it/s, step size=1.33e-01, acc. prob=0.813]Sample: 46%|████▋ | 1394/3000 [00:31, 32.74it/s, step size=1.33e-01, acc. prob=0.812]Sample: 47%|████▋ | 1398/3000 [00:31, 33.08it/s, step size=1.33e-01, acc. prob=0.812]Sample: 47%|████▋ | 1402/3000 [00:31, 33.45it/s, step size=1.33e-01, acc. prob=0.811]Sample: 47%|████▋ | 1408/3000 [00:31, 39.08it/s, step size=1.33e-01, acc. prob=0.811]Sample: 47%|████▋ | 1413/3000 [00:31, 38.30it/s, step size=1.33e-01, acc. prob=0.810]Sample: 47%|████▋ | 1418/3000 [00:31, 39.98it/s, step size=1.33e-01, acc. prob=0.809]Sample: 47%|████▋ | 1423/3000 [00:32, 41.62it/s, step size=1.33e-01, acc. prob=0.808]Sample: 48%|████▊ | 1428/3000 [00:32, 42.91it/s, step size=1.33e-01, acc. prob=0.808]Sample: 48%|████▊ | 1435/3000 [00:32, 47.25it/s, step size=1.33e-01, acc. prob=0.809]Sample: 48%|████▊ | 1440/3000 [00:32, 44.47it/s, step size=1.33e-01, acc. prob=0.808]Sample: 48%|████▊ | 1445/3000 [00:32, 45.63it/s, step size=1.33e-01, acc. prob=0.808]Sample: 48%|████▊ | 1450/3000 [00:32, 46.72it/s, step size=1.33e-01, acc. prob=0.808]Sample: 49%|████▊ | 1456/3000 [00:32, 49.28it/s, step size=1.33e-01, acc. prob=0.807]Sample: 49%|████▊ | 1461/3000 [00:32, 47.59it/s, step size=1.33e-01, acc. prob=0.808]Sample: 49%|████▉ | 1466/3000 [00:32, 48.18it/s, step size=1.33e-01, acc. prob=0.808]Sample: 49%|████▉ | 1472/3000 [00:33, 49.91it/s, step size=1.33e-01, acc. prob=0.808]Sample: 49%|████▉ | 1478/3000 [00:33, 52.41it/s, step size=1.33e-01, acc. prob=0.807]Sample: 49%|████▉ | 1484/3000 [00:33, 49.63it/s, step size=1.33e-01, acc. prob=0.809]Sample: 50%|████▉ | 1490/3000 [00:33, 39.17it/s, step size=1.33e-01, acc. prob=0.808]Sample: 50%|████▉ | 1495/3000 [00:33, 40.03it/s, step size=1.33e-01, acc. prob=0.809]Sample: 50%|█████ | 1500/3000 [00:33, 40.81it/s, step size=1.33e-01, acc. prob=0.808]Sample: 50%|█████ | 1505/3000 [00:33, 41.87it/s, step size=1.33e-01, acc. prob=0.808]Sample: 50%|█████ | 1510/3000 [00:34, 32.38it/s, step size=1.33e-01, acc. prob=0.808]Sample: 50%|█████ | 1515/3000 [00:34, 35.14it/s, step size=1.33e-01, acc. prob=0.809]Sample: 51%|█████ | 1519/3000 [00:34, 35.23it/s, step size=1.33e-01, acc. prob=0.810]Sample: 51%|█████ | 1523/3000 [00:34, 36.20it/s, step size=1.33e-01, acc. prob=0.809]Sample: 51%|█████ | 1527/3000 [00:34, 37.15it/s, step size=1.33e-01, acc. prob=0.810]Sample: 51%|█████ | 1533/3000 [00:34, 42.21it/s, step size=1.33e-01, acc. prob=0.810]Sample: 51%|█████▏ | 1538/3000 [00:34, 40.57it/s, step size=1.33e-01, acc. prob=0.810]Sample: 51%|█████▏ | 1543/3000 [00:34, 42.44it/s, step size=1.33e-01, acc. prob=0.811]Sample: 52%|█████▏ | 1550/3000 [00:34, 48.39it/s, step size=1.33e-01, acc. prob=0.811]Sample: 52%|█████▏ | 1557/3000 [00:35, 52.56it/s, step size=1.33e-01, acc. prob=0.812]Sample: 52%|█████▏ | 1563/3000 [00:35, 49.87it/s, step size=1.33e-01, acc. prob=0.812]Sample: 52%|█████▏ | 1569/3000 [00:35, 48.00it/s, step size=1.33e-01, acc. prob=0.812]Sample: 53%|█████▎ | 1576/3000 [00:35, 50.68it/s, step size=1.33e-01, acc. prob=0.813]Sample: 53%|█████▎ | 1582/3000 [00:35, 52.86it/s, step size=1.33e-01, acc. prob=0.814]Sample: 53%|█████▎ | 1588/3000 [00:35, 51.16it/s, step size=1.33e-01, acc. prob=0.814]Sample: 53%|█████▎ | 1594/3000 [00:35, 51.42it/s, step size=1.33e-01, acc. prob=0.814]Sample: 53%|█████▎ | 1600/3000 [00:35, 53.08it/s, step size=1.33e-01, acc. prob=0.814]Sample: 54%|█████▎ | 1606/3000 [00:36, 49.33it/s, step size=1.33e-01, acc. prob=0.814]Sample: 54%|█████▍ | 1613/3000 [00:36, 54.45it/s, step size=1.33e-01, acc. prob=0.814]Sample: 54%|█████▍ | 1619/3000 [00:36, 50.12it/s, step size=1.33e-01, acc. prob=0.812]Sample: 54%|█████▍ | 1625/3000 [00:36, 48.86it/s, step size=1.33e-01, acc. prob=0.813]Sample: 54%|█████▍ | 1630/3000 [00:36, 48.48it/s, step size=1.33e-01, acc. prob=0.813]Sample: 55%|█████▍ | 1635/3000 [00:36, 44.97it/s, step size=1.33e-01, acc. prob=0.813]Sample: 55%|█████▍ | 1641/3000 [00:36, 45.77it/s, step size=1.33e-01, acc. prob=0.814]Sample: 55%|█████▍ | 1646/3000 [00:36, 41.41it/s, step size=1.33e-01, acc. prob=0.813]Sample: 55%|█████▌ | 1651/3000 [00:37, 38.42it/s, step size=1.33e-01, acc. prob=0.813]Sample: 55%|█████▌ | 1656/3000 [00:37, 37.24it/s, step size=1.33e-01, acc. prob=0.813]Sample: 55%|█████▌ | 1660/3000 [00:37, 37.01it/s, step size=1.33e-01, acc. prob=0.814]Sample: 56%|█████▌ | 1665/3000 [00:37, 38.43it/s, step size=1.33e-01, acc. prob=0.814]Sample: 56%|█████▌ | 1670/3000 [00:37, 40.36it/s, step size=1.33e-01, acc. prob=0.815]Sample: 56%|█████▌ | 1675/3000 [00:37, 42.45it/s, step size=1.33e-01, acc. prob=0.815]Sample: 56%|█████▌ | 1680/3000 [00:37, 40.11it/s, step size=1.33e-01, acc. prob=0.815]Sample: 56%|█████▌ | 1685/3000 [00:38, 36.63it/s, step size=1.33e-01, acc. prob=0.815]Sample: 56%|█████▋ | 1689/3000 [00:38, 37.27it/s, step size=1.33e-01, acc. prob=0.815]Sample: 56%|█████▋ | 1693/3000 [00:38, 33.94it/s, step size=1.33e-01, acc. prob=0.815]Sample: 57%|█████▋ | 1697/3000 [00:38, 31.66it/s, step size=1.33e-01, acc. prob=0.815]Sample: 57%|█████▋ | 1701/3000 [00:38, 30.03it/s, step size=1.33e-01, acc. prob=0.815]Sample: 57%|█████▋ | 1705/3000 [00:38, 27.95it/s, step size=1.33e-01, acc. prob=0.816]Sample: 57%|█████▋ | 1708/3000 [00:38, 26.41it/s, step size=1.33e-01, acc. prob=0.816]Sample: 57%|█████▋ | 1711/3000 [00:39, 25.80it/s, step size=1.33e-01, acc. prob=0.816]Sample: 57%|█████▋ | 1716/3000 [00:39, 30.29it/s, step size=1.33e-01, acc. prob=0.816]Sample: 57%|█████▋ | 1720/3000 [00:39, 31.46it/s, step size=1.33e-01, acc. prob=0.816]Sample: 57%|█████▋ | 1724/3000 [00:39, 28.90it/s, step size=1.33e-01, acc. prob=0.816]Sample: 58%|█████▊ | 1728/3000 [00:39, 30.02it/s, step size=1.33e-01, acc. prob=0.817]Sample: 58%|█████▊ | 1732/3000 [00:39, 29.45it/s, step size=1.33e-01, acc. prob=0.816]Sample: 58%|█████▊ | 1736/3000 [00:39, 31.45it/s, step size=1.33e-01, acc. prob=0.816]Sample: 58%|█████▊ | 1741/3000 [00:39, 34.48it/s, step size=1.33e-01, acc. prob=0.816]Sample: 58%|█████▊ | 1747/3000 [00:39, 40.04it/s, step size=1.33e-01, acc. prob=0.816]Sample: 58%|█████▊ | 1754/3000 [00:40, 46.96it/s, step size=1.33e-01, acc. prob=0.816]Sample: 59%|█████▊ | 1760/3000 [00:40, 49.44it/s, step size=1.33e-01, acc. prob=0.817]Sample: 59%|█████▉ | 1766/3000 [00:40, 48.79it/s, step size=1.33e-01, acc. prob=0.818]Sample: 59%|█████▉ | 1773/3000 [00:40, 54.28it/s, step size=1.33e-01, acc. prob=0.818]Sample: 59%|█████▉ | 1779/3000 [00:40, 50.72it/s, step size=1.33e-01, acc. prob=0.818]Sample: 60%|█████▉ | 1785/3000 [00:40, 43.79it/s, step size=1.33e-01, acc. prob=0.817]Sample: 60%|█████▉ | 1790/3000 [00:40, 43.91it/s, step size=1.33e-01, acc. prob=0.817]Sample: 60%|█████▉ | 1795/3000 [00:40, 45.19it/s, step size=1.33e-01, acc. prob=0.817]Sample: 60%|██████ | 1800/3000 [00:41, 44.87it/s, step size=1.33e-01, acc. prob=0.816]Sample: 60%|██████ | 1806/3000 [00:41, 45.76it/s, step size=1.33e-01, acc. prob=0.816]Sample: 60%|██████ | 1811/3000 [00:41, 45.64it/s, step size=1.33e-01, acc. prob=0.816]Sample: 61%|██████ | 1817/3000 [00:41, 47.78it/s, step size=1.33e-01, acc. prob=0.815]Sample: 61%|██████ | 1824/3000 [00:41, 52.66it/s, step size=1.33e-01, acc. prob=0.815]Sample: 61%|██████ | 1830/3000 [00:41, 53.93it/s, step size=1.33e-01, acc. prob=0.815]Sample: 61%|██████ | 1836/3000 [00:41, 48.59it/s, step size=1.33e-01, acc. prob=0.815]Sample: 61%|██████▏ | 1841/3000 [00:41, 47.19it/s, step size=1.33e-01, acc. prob=0.815]Sample: 62%|██████▏ | 1846/3000 [00:42, 45.67it/s, step size=1.33e-01, acc. prob=0.815]Sample: 62%|██████▏ | 1851/3000 [00:42, 44.83it/s, step size=1.33e-01, acc. prob=0.815]Sample: 62%|██████▏ | 1857/3000 [00:42, 46.42it/s, step size=1.33e-01, acc. prob=0.815]Sample: 62%|██████▏ | 1862/3000 [00:42, 45.67it/s, step size=1.33e-01, acc. prob=0.815]Sample: 62%|██████▏ | 1867/3000 [00:42, 43.99it/s, step size=1.33e-01, acc. prob=0.814]Sample: 62%|██████▏ | 1872/3000 [00:42, 42.63it/s, step size=1.33e-01, acc. prob=0.814]Sample: 63%|██████▎ | 1878/3000 [00:42, 45.71it/s, step size=1.33e-01, acc. prob=0.814]Sample: 63%|██████▎ | 1885/3000 [00:42, 48.05it/s, step size=1.33e-01, acc. prob=0.813]Sample: 63%|██████▎ | 1890/3000 [00:42, 48.41it/s, step size=1.33e-01, acc. prob=0.813]Sample: 63%|██████▎ | 1896/3000 [00:43, 50.01it/s, step size=1.33e-01, acc. prob=0.814]Sample: 63%|██████▎ | 1902/3000 [00:43, 42.03it/s, step size=1.33e-01, acc. prob=0.813]Sample: 64%|██████▎ | 1908/3000 [00:43, 46.18it/s, step size=1.33e-01, acc. prob=0.814]Sample: 64%|██████▍ | 1913/3000 [00:43, 44.16it/s, step size=1.33e-01, acc. prob=0.813]Sample: 64%|██████▍ | 1918/3000 [00:43, 42.72it/s, step size=1.33e-01, acc. prob=0.814]Sample: 64%|██████▍ | 1923/3000 [00:43, 44.19it/s, step size=1.33e-01, acc. prob=0.814]Sample: 64%|██████▍ | 1928/3000 [00:43, 42.68it/s, step size=1.33e-01, acc. prob=0.814]Sample: 64%|██████▍ | 1933/3000 [00:43, 41.38it/s, step size=1.33e-01, acc. prob=0.814]Sample: 65%|██████▍ | 1938/3000 [00:44, 40.03it/s, step size=1.33e-01, acc. prob=0.813]Sample: 65%|██████▍ | 1943/3000 [00:44, 38.32it/s, step size=1.33e-01, acc. prob=0.814]Sample: 65%|██████▍ | 1947/3000 [00:44, 37.43it/s, step size=1.33e-01, acc. prob=0.814]Sample: 65%|██████▌ | 1952/3000 [00:44, 38.02it/s, step size=1.33e-01, acc. prob=0.814]Sample: 65%|██████▌ | 1956/3000 [00:44, 37.19it/s, step size=1.33e-01, acc. prob=0.813]Sample: 65%|██████▌ | 1962/3000 [00:44, 41.02it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▌ | 1968/3000 [00:44, 43.31it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▌ | 1973/3000 [00:45, 38.80it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▌ | 1977/3000 [00:45, 37.71it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▌ | 1983/3000 [00:45, 42.59it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▋ | 1989/3000 [00:45, 45.80it/s, step size=1.33e-01, acc. prob=0.813]Sample: 66%|██████▋ | 1994/3000 [00:45, 45.88it/s, step size=1.33e-01, acc. prob=0.813]Sample: 67%|██████▋ | 1999/3000 [00:45, 44.44it/s, step size=1.33e-01, acc. prob=0.813]Sample: 67%|██████▋ | 2004/3000 [00:45, 40.67it/s, step size=1.33e-01, acc. prob=0.813]Sample: 67%|██████▋ | 2011/3000 [00:45, 46.31it/s, step size=1.33e-01, acc. prob=0.813]Sample: 67%|██████▋ | 2016/3000 [00:45, 47.09it/s, step size=1.33e-01, acc. prob=0.813]Sample: 67%|██████▋ | 2022/3000 [00:46, 48.84it/s, step size=1.33e-01, acc. prob=0.813]Sample: 68%|██████▊ | 2028/3000 [00:46, 50.08it/s, step size=1.33e-01, acc. prob=0.813]Sample: 68%|██████▊ | 2034/3000 [00:46, 52.31it/s, step size=1.33e-01, acc. prob=0.814]Sample: 68%|██████▊ | 2041/3000 [00:46, 55.09it/s, step size=1.33e-01, acc. prob=0.814]Sample: 68%|██████▊ | 2048/3000 [00:46, 58.61it/s, step size=1.33e-01, acc. prob=0.815]Sample: 69%|██████▊ | 2056/3000 [00:46, 61.76it/s, step size=1.33e-01, acc. prob=0.815]Sample: 69%|██████▉ | 2063/3000 [00:46, 61.70it/s, step size=1.33e-01, acc. prob=0.814]Sample: 69%|██████▉ | 2070/3000 [00:46, 54.64it/s, step size=1.33e-01, acc. prob=0.813]Sample: 69%|██████▉ | 2077/3000 [00:47, 58.25it/s, step size=1.33e-01, acc. prob=0.814]Sample: 69%|██████▉ | 2083/3000 [00:47, 54.64it/s, step size=1.33e-01, acc. prob=0.814]Sample: 70%|██████▉ | 2089/3000 [00:47, 52.32it/s, step size=1.33e-01, acc. prob=0.814]Sample: 70%|██████▉ | 2095/3000 [00:47, 50.85it/s, step size=1.33e-01, acc. prob=0.815]Sample: 70%|███████ | 2101/3000 [00:47, 48.48it/s, step size=1.33e-01, acc. prob=0.815]Sample: 70%|███████ | 2106/3000 [00:47, 48.10it/s, step size=1.33e-01, acc. prob=0.815]Sample: 70%|███████ | 2111/3000 [00:47, 46.73it/s, step size=1.33e-01, acc. prob=0.814]Sample: 71%|███████ | 2117/3000 [00:47, 48.51it/s, step size=1.33e-01, acc. prob=0.814]Sample: 71%|███████ | 2123/3000 [00:47, 48.29it/s, step size=1.33e-01, acc. prob=0.815]Sample: 71%|███████ | 2131/3000 [00:48, 53.88it/s, step size=1.33e-01, acc. prob=0.815]Sample: 71%|███████ | 2137/3000 [00:48, 50.23it/s, step size=1.33e-01, acc. prob=0.815]Sample: 71%|███████▏ | 2143/3000 [00:48, 44.17it/s, step size=1.33e-01, acc. prob=0.815]Sample: 72%|███████▏ | 2148/3000 [00:48, 40.41it/s, step size=1.33e-01, acc. prob=0.815]Sample: 72%|███████▏ | 2153/3000 [00:48, 42.55it/s, step size=1.33e-01, acc. prob=0.815]Sample: 72%|███████▏ | 2158/3000 [00:48, 41.32it/s, step size=1.33e-01, acc. prob=0.815]Sample: 72%|███████▏ | 2163/3000 [00:48, 40.00it/s, step size=1.33e-01, acc. prob=0.816]Sample: 72%|███████▏ | 2168/3000 [00:49, 39.08it/s, step size=1.33e-01, acc. prob=0.816]Sample: 72%|███████▏ | 2173/3000 [00:49, 39.63it/s, step size=1.33e-01, acc. prob=0.817]Sample: 73%|███████▎ | 2178/3000 [00:49, 41.72it/s, step size=1.33e-01, acc. prob=0.817]Sample: 73%|███████▎ | 2184/3000 [00:49, 44.64it/s, step size=1.33e-01, acc. prob=0.817]Sample: 73%|███████▎ | 2189/3000 [00:49, 42.35it/s, step size=1.33e-01, acc. prob=0.817]Sample: 73%|███████▎ | 2194/3000 [00:49, 42.32it/s, step size=1.33e-01, acc. prob=0.817]Sample: 73%|███████▎ | 2199/3000 [00:49, 44.29it/s, step size=1.33e-01, acc. prob=0.816]Sample: 73%|███████▎ | 2204/3000 [00:49, 45.51it/s, step size=1.33e-01, acc. prob=0.817]Sample: 74%|███████▎ | 2210/3000 [00:49, 46.74it/s, step size=1.33e-01, acc. prob=0.817]Sample: 74%|███████▍ | 2217/3000 [00:50, 52.66it/s, step size=1.33e-01, acc. prob=0.818]Sample: 74%|███████▍ | 2223/3000 [00:50, 48.06it/s, step size=1.33e-01, acc. prob=0.818]Sample: 74%|███████▍ | 2228/3000 [00:50, 46.09it/s, step size=1.33e-01, acc. prob=0.818]Sample: 74%|███████▍ | 2233/3000 [00:50, 43.67it/s, step size=1.33e-01, acc. prob=0.818]Sample: 75%|███████▍ | 2238/3000 [00:50, 42.05it/s, step size=1.33e-01, acc. prob=0.819]Sample: 75%|███████▍ | 2245/3000 [00:50, 46.71it/s, step size=1.33e-01, acc. prob=0.819]Sample: 75%|███████▌ | 2250/3000 [00:50, 39.93it/s, step size=1.33e-01, acc. prob=0.819]Sample: 75%|███████▌ | 2256/3000 [00:51, 42.21it/s, step size=1.33e-01, acc. prob=0.818]Sample: 75%|███████▌ | 2261/3000 [00:51, 38.85it/s, step size=1.33e-01, acc. prob=0.818]Sample: 76%|███████▌ | 2266/3000 [00:51, 37.01it/s, step size=1.33e-01, acc. prob=0.818]Sample: 76%|███████▌ | 2272/3000 [00:51, 40.65it/s, step size=1.33e-01, acc. prob=0.818]Sample: 76%|███████▌ | 2278/3000 [00:51, 44.22it/s, step size=1.33e-01, acc. prob=0.818]Sample: 76%|███████▌ | 2283/3000 [00:51, 41.89it/s, step size=1.33e-01, acc. prob=0.818]Sample: 76%|███████▋ | 2288/3000 [00:51, 41.60it/s, step size=1.33e-01, acc. prob=0.819]Sample: 77%|███████▋ | 2296/3000 [00:51, 50.22it/s, step size=1.33e-01, acc. prob=0.819]Sample: 77%|███████▋ | 2302/3000 [00:52, 51.49it/s, step size=1.33e-01, acc. prob=0.819]Sample: 77%|███████▋ | 2308/3000 [00:52, 49.73it/s, step size=1.33e-01, acc. prob=0.819]Sample: 77%|███████▋ | 2314/3000 [00:52, 50.62it/s, step size=1.33e-01, acc. prob=0.819]Sample: 77%|███████▋ | 2320/3000 [00:52, 49.40it/s, step size=1.33e-01, acc. prob=0.819]Sample: 78%|███████▊ | 2328/3000 [00:52, 57.17it/s, step size=1.33e-01, acc. prob=0.819]Sample: 78%|███████▊ | 2336/3000 [00:52, 59.14it/s, step size=1.33e-01, acc. prob=0.819]Sample: 78%|███████▊ | 2343/3000 [00:52, 58.33it/s, step size=1.33e-01, acc. prob=0.819]Sample: 78%|███████▊ | 2350/3000 [00:52, 60.45it/s, step size=1.33e-01, acc. prob=0.819]Sample: 79%|███████▊ | 2357/3000 [00:53, 48.28it/s, step size=1.33e-01, acc. prob=0.818]Sample: 79%|███████▉ | 2363/3000 [00:53, 46.87it/s, step size=1.33e-01, acc. prob=0.818]Sample: 79%|███████▉ | 2370/3000 [00:53, 51.05it/s, step size=1.33e-01, acc. prob=0.819]Sample: 79%|███████▉ | 2376/3000 [00:53, 50.01it/s, step size=1.33e-01, acc. prob=0.819]Sample: 79%|███████▉ | 2382/3000 [00:53, 45.27it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|███████▉ | 2387/3000 [00:53, 44.03it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|███████▉ | 2392/3000 [00:53, 44.89it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|███████▉ | 2397/3000 [00:53, 45.84it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|████████ | 2403/3000 [00:54, 46.84it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|████████ | 2408/3000 [00:54, 35.64it/s, step size=1.33e-01, acc. prob=0.818]Sample: 80%|████████ | 2413/3000 [00:54, 37.50it/s, step size=1.33e-01, acc. prob=0.818]Sample: 81%|████████ | 2420/3000 [00:54, 40.70it/s, step size=1.33e-01, acc. prob=0.819]Sample: 81%|████████ | 2426/3000 [00:54, 44.01it/s, step size=1.33e-01, acc. prob=0.819]Sample: 81%|████████ | 2432/3000 [00:54, 46.67it/s, step size=1.33e-01, acc. prob=0.819]Sample: 81%|████████▏ | 2439/3000 [00:54, 50.75it/s, step size=1.33e-01, acc. prob=0.819]Sample: 82%|████████▏ | 2445/3000 [00:55, 47.09it/s, step size=1.33e-01, acc. prob=0.819]Sample: 82%|████████▏ | 2451/3000 [00:55, 49.42it/s, step size=1.33e-01, acc. prob=0.820]Sample: 82%|████████▏ | 2457/3000 [00:55, 46.51it/s, step size=1.33e-01, acc. prob=0.820]Sample: 82%|████████▏ | 2462/3000 [00:55, 34.63it/s, step size=1.33e-01, acc. prob=0.819]Sample: 82%|████████▏ | 2466/3000 [00:55, 29.11it/s, step size=1.33e-01, acc. prob=0.819]Sample: 82%|████████▏ | 2470/3000 [00:55, 26.04it/s, step size=1.33e-01, acc. prob=0.819]Sample: 82%|████████▏ | 2473/3000 [00:56, 26.62it/s, step size=1.33e-01, acc. prob=0.819]Sample: 83%|████████▎ | 2477/3000 [00:56, 27.93it/s, step size=1.33e-01, acc. prob=0.820]Sample: 83%|████████▎ | 2481/3000 [00:56, 28.14it/s, step size=1.33e-01, acc. prob=0.819]Sample: 83%|████████▎ | 2486/3000 [00:56, 32.91it/s, step size=1.33e-01, acc. prob=0.820]Sample: 83%|████████▎ | 2490/3000 [00:56, 34.44it/s, step size=1.33e-01, acc. prob=0.820]Sample: 83%|████████▎ | 2494/3000 [00:56, 30.07it/s, step size=1.33e-01, acc. prob=0.820]Sample: 83%|████████▎ | 2498/3000 [00:56, 31.48it/s, step size=1.33e-01, acc. prob=0.820]Sample: 83%|████████▎ | 2502/3000 [00:56, 32.61it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▎ | 2509/3000 [00:57, 35.79it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▍ | 2513/3000 [00:57, 28.67it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▍ | 2517/3000 [00:57, 30.08it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▍ | 2524/3000 [00:57, 36.98it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▍ | 2528/3000 [00:57, 36.84it/s, step size=1.33e-01, acc. prob=0.820]Sample: 84%|████████▍ | 2532/3000 [00:57, 36.74it/s, step size=1.33e-01, acc. prob=0.820]Sample: 85%|████████▍ | 2538/3000 [00:57, 41.49it/s, step size=1.33e-01, acc. prob=0.820]Sample: 85%|████████▍ | 2543/3000 [00:58, 41.47it/s, step size=1.33e-01, acc. prob=0.820]Sample: 85%|████████▍ | 2548/3000 [00:58, 43.17it/s, step size=1.33e-01, acc. prob=0.820]Sample: 85%|████████▌ | 2553/3000 [00:58, 41.60it/s, step size=1.33e-01, acc. prob=0.819]Sample: 85%|████████▌ | 2558/3000 [00:58, 35.82it/s, step size=1.33e-01, acc. prob=0.819]Sample: 85%|████████▌ | 2562/3000 [00:58, 34.57it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▌ | 2566/3000 [00:58, 35.06it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▌ | 2572/3000 [00:58, 40.36it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▌ | 2577/3000 [00:58, 41.61it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▌ | 2583/3000 [00:59, 43.56it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▋ | 2588/3000 [00:59, 41.25it/s, step size=1.33e-01, acc. prob=0.819]Sample: 86%|████████▋ | 2593/3000 [00:59, 39.77it/s, step size=1.33e-01, acc. prob=0.819]Sample: 87%|████████▋ | 2599/3000 [00:59, 43.91it/s, step size=1.33e-01, acc. prob=0.819]Sample: 87%|████████▋ | 2605/3000 [00:59, 46.06it/s, step size=1.33e-01, acc. prob=0.818]Sample: 87%|████████▋ | 2611/3000 [00:59, 46.99it/s, step size=1.33e-01, acc. prob=0.818]Sample: 87%|████████▋ | 2616/3000 [00:59, 46.61it/s, step size=1.33e-01, acc. prob=0.818]Sample: 87%|████████▋ | 2622/3000 [00:59, 49.04it/s, step size=1.33e-01, acc. prob=0.818]Sample: 88%|████████▊ | 2627/3000 [00:59, 48.16it/s, step size=1.33e-01, acc. prob=0.818]Sample: 88%|████████▊ | 2632/3000 [01:00, 44.14it/s, step size=1.33e-01, acc. prob=0.818]Sample: 88%|████████▊ | 2637/3000 [01:00, 40.28it/s, step size=1.33e-01, acc. prob=0.818]Sample: 88%|████████▊ | 2643/3000 [01:00, 44.26it/s, step size=1.33e-01, acc. prob=0.818]Sample: 88%|████████▊ | 2649/3000 [01:00, 46.40it/s, step size=1.33e-01, acc. prob=0.817]Sample: 88%|████████▊ | 2655/3000 [01:00, 49.78it/s, step size=1.33e-01, acc. prob=0.817]Sample: 89%|████████▊ | 2661/3000 [01:00, 52.43it/s, step size=1.33e-01, acc. prob=0.818]Sample: 89%|████████▉ | 2667/3000 [01:00, 52.67it/s, step size=1.33e-01, acc. prob=0.818]Sample: 89%|████████▉ | 2674/3000 [01:00, 55.58it/s, step size=1.33e-01, acc. prob=0.818]Sample: 89%|████████▉ | 2680/3000 [01:01, 50.34it/s, step size=1.33e-01, acc. prob=0.818]Sample: 90%|████████▉ | 2686/3000 [01:01, 47.13it/s, step size=1.33e-01, acc. prob=0.818]Sample: 90%|████████▉ | 2691/3000 [01:01, 47.65it/s, step size=1.33e-01, acc. prob=0.818]Sample: 90%|████████▉ | 2696/3000 [01:01, 48.23it/s, step size=1.33e-01, acc. prob=0.818]Sample: 90%|█████████ | 2702/3000 [01:01, 49.88it/s, step size=1.33e-01, acc. prob=0.817]Sample: 90%|█████████ | 2708/3000 [01:01, 52.57it/s, step size=1.33e-01, acc. prob=0.818]Sample: 90%|█████████ | 2714/3000 [01:01, 47.08it/s, step size=1.33e-01, acc. prob=0.818]Sample: 91%|█████████ | 2719/3000 [01:01, 47.69it/s, step size=1.33e-01, acc. prob=0.818]Sample: 91%|█████████ | 2725/3000 [01:02, 48.68it/s, step size=1.33e-01, acc. prob=0.818]Sample: 91%|█████████ | 2730/3000 [01:02, 48.86it/s, step size=1.33e-01, acc. prob=0.818]Sample: 91%|█████████ | 2736/3000 [01:02, 50.33it/s, step size=1.33e-01, acc. prob=0.818]Sample: 91%|█████████▏| 2742/3000 [01:02, 51.33it/s, step size=1.33e-01, acc. prob=0.818]Sample: 92%|█████████▏| 2748/3000 [01:02, 52.38it/s, step size=1.33e-01, acc. prob=0.818]Sample: 92%|█████████▏| 2755/3000 [01:02, 54.27it/s, step size=1.33e-01, acc. prob=0.818]Sample: 92%|█████████▏| 2761/3000 [01:02, 54.43it/s, step size=1.33e-01, acc. prob=0.818]Sample: 92%|█████████▏| 2767/3000 [01:02, 49.89it/s, step size=1.33e-01, acc. prob=0.818]Sample: 92%|█████████▏| 2773/3000 [01:02, 51.69it/s, step size=1.33e-01, acc. prob=0.819]Sample: 93%|█████████▎| 2779/3000 [01:03, 51.27it/s, step size=1.33e-01, acc. prob=0.818]Sample: 93%|█████████▎| 2785/3000 [01:03, 47.45it/s, step size=1.33e-01, acc. prob=0.819]Sample: 93%|█████████▎| 2790/3000 [01:03, 45.82it/s, step size=1.33e-01, acc. prob=0.819]Sample: 93%|█████████▎| 2795/3000 [01:03, 43.25it/s, step size=1.33e-01, acc. prob=0.819]Sample: 93%|█████████▎| 2801/3000 [01:03, 46.17it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▎| 2806/3000 [01:03, 44.44it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▎| 2811/3000 [01:03, 42.76it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▍| 2816/3000 [01:03, 43.58it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▍| 2821/3000 [01:04, 41.39it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▍| 2827/3000 [01:04, 44.76it/s, step size=1.33e-01, acc. prob=0.819]Sample: 94%|█████████▍| 2832/3000 [01:04, 43.43it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▍| 2837/3000 [01:04, 45.07it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▍| 2842/3000 [01:04, 37.90it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▍| 2847/3000 [01:04, 36.09it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▌| 2852/3000 [01:04, 39.32it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▌| 2857/3000 [01:04, 37.43it/s, step size=1.33e-01, acc. prob=0.819]Sample: 95%|█████████▌| 2862/3000 [01:05, 38.44it/s, step size=1.33e-01, acc. prob=0.819]Sample: 96%|█████████▌| 2866/3000 [01:05, 38.39it/s, step size=1.33e-01, acc. prob=0.819]Sample: 96%|█████████▌| 2870/3000 [01:05, 38.17it/s, step size=1.33e-01, acc. prob=0.819]Sample: 96%|█████████▌| 2874/3000 [01:05, 36.33it/s, step size=1.33e-01, acc. prob=0.819]Sample: 96%|█████████▌| 2880/3000 [01:05, 40.13it/s, step size=1.33e-01, acc. prob=0.818]Sample: 96%|█████████▌| 2885/3000 [01:05, 38.69it/s, step size=1.33e-01, acc. prob=0.818]Sample: 96%|█████████▋| 2889/3000 [01:05, 38.80it/s, step size=1.33e-01, acc. prob=0.819]Sample: 96%|█████████▋| 2893/3000 [01:05, 38.06it/s, step size=1.33e-01, acc. prob=0.818]Sample: 97%|█████████▋| 2898/3000 [01:06, 36.94it/s, step size=1.33e-01, acc. prob=0.818]Sample: 97%|█████████▋| 2902/3000 [01:06, 36.93it/s, step size=1.33e-01, acc. prob=0.818]Sample: 97%|█████████▋| 2907/3000 [01:06, 38.19it/s, step size=1.33e-01, acc. prob=0.818]Sample: 97%|█████████▋| 2911/3000 [01:06, 35.59it/s, step size=1.33e-01, acc. prob=0.818]Sample: 97%|█████████▋| 2918/3000 [01:06, 43.71it/s, step size=1.33e-01, acc. prob=0.819]Sample: 97%|█████████▋| 2923/3000 [01:06, 33.10it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2927/3000 [01:06, 28.98it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2933/3000 [01:07, 33.74it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2939/3000 [01:07, 38.34it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2944/3000 [01:07, 35.64it/s, step size=1.33e-01, acc. prob=0.819]Sample: 98%|█████████▊| 2948/3000 [01:07, 27.07it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2952/3000 [01:07, 24.12it/s, step size=1.33e-01, acc. prob=0.818]Sample: 98%|█████████▊| 2955/3000 [01:08, 20.31it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▊| 2958/3000 [01:08, 21.55it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2963/3000 [01:08, 26.52it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2968/3000 [01:08, 29.65it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2972/3000 [01:08, 28.86it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2976/3000 [01:08, 26.06it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2979/3000 [01:09, 19.28it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2982/3000 [01:09, 16.36it/s, step size=1.33e-01, acc. prob=0.818]Sample: 99%|█████████▉| 2984/3000 [01:09, 15.05it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2987/3000 [01:09, 16.14it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2990/3000 [01:09, 18.18it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2993/3000 [01:10, 13.16it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2995/3000 [01:10, 12.66it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2997/3000 [01:10, 11.97it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|█████████▉| 2999/3000 [01:10, 12.58it/s, step size=1.33e-01, acc. prob=0.818]Sample: 100%|██████████| 3000/3000 [01:10, 42.45it/s, step size=1.33e-01, acc. prob=0.818]
VLM Score - Ordered Logistic Regression:
Beta (VerbType effect): -2.352
95% HDI: [-4.137, -0.612]
P(beta < 0): 0.994
How to Read This Plot
We used a Contrast Coding system for our analysis: Unaccusatives were assigned +0.5 and Unergatives were assigned −0.5. Because of this math, our “Beta” (β) represents the difference: Unaccusative minus Unergative.
- The Zero Line (The “Null”)
The vertical gray line at 0 represents “no difference”. If a model’s “cigar” is centered here, it means the model treats both picture types exactly the same.
- The Left Side (Negative β)
If the distribution is on the left, the score for Unaccusatives was lower than Unergatives.
The Finding: This is the “Danger Zone” for our stimuli. It means Unaccusative pictures are harder for the models to understand or identify.
Our Result: Both Full Scene (CLIP) and Scene Verification (VLM) are shifted heavily to the left. This tells us that, visually speaking, the unaccusative pictures are significantly less clear or representative than the unergative ones.
- The Right Side (Positive β)
If the distribution were on the right, it would mean Unaccusatives scored higher.
The Finding: This would suggest Unaccusative pictures are actually “better” or “easier” than Unergative ones.
Our Result: None of our models show this
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Data dictionary from your MCMC samples
beta_data = {
'Full Scene (CLIP)': clip_samples['beta'].numpy(),
'Subject Salience (CLIP)': subject_samples['beta'].numpy(),
'Scene Verification (VLM)': vlm_samples['beta'].numpy()
}
# Adjust figure size for better vertical separation
fig, ax = plt.subplots(figsize=(14, 7))
sns.set_style("whitegrid", {'axes.grid': True, 'grid.color': '.95'})
labels = list(beta_data.keys())
colors = ['#3498db', '#9b59b6', '#e74c3c']
for i, label in enumerate(labels):
samples = beta_data[label]
mean_val = samples.mean()
# 1. Calculate multiple intervals for the "stacking" effect
hdi_95 = np.percentile(samples, [2.5, 97.5])
hdi_80 = np.percentile(samples, [10, 90])
hdi_50 = np.percentile(samples, [25, 75])
# 2. Plot the stacked lines (Bottom to Top: thinnest/widest first)
# 95% Interval - Thin
ax.hlines(i, hdi_95[0], hdi_95[1], color=colors[i], linewidth=1.5, alpha=0.4, zorder=1)
# 80% Interval - Medium
ax.hlines(i, hdi_80[0], hdi_80[1], color=colors[i], linewidth=5.0, alpha=0.7, zorder=2)
# 50% Interval - Thick
ax.hlines(i, hdi_50[0], hdi_50[1], color=colors[i], linewidth=10.0, alpha=1.0, zorder=3)
# 3. Plot the Mean point
ax.plot(mean_val, i, 'o', color='white', markersize=8, zorder=4)
# 4. Perfectly Aligned Statistics
p_dir = (samples < 0).mean() if mean_val < 0 else (samples > 0).mean()
prob_text = f"$P(\\beta {'<' if mean_val < 0 else '>' } 0) = {p_dir:.2f}$"
# Locked to y-coordinate 'i' and x-coordinate 3.0 (outside plot area)
ax.text(3.0, i, prob_text, va='center', ha='left',
fontsize=13, fontweight='bold', color=colors[i])
# 5. Descriptive Annotations (The "How to Read" Guide)
ax.axvline(x=0, color='black', linestyle='-', linewidth=1.5, alpha=0.6, zorder=0)
# Arrow pointing Left (Negative Beta)
ax.annotate('', xy=(-5, -1.0), xytext=(-0.5, -1.0),
arrowprops=dict(arrowstyle="->", color='gray', lw=1.5))
ax.text(-2.75, -1.4, "Lower Scores for\nUnaccusatives", ha='center', color='gray', fontweight='bold')
# Arrow pointing Right (Positive Beta)
ax.annotate('', xy=(2.5, -1.0), xytext=(0.5, -1.0),
arrowprops=dict(arrowstyle="->", color='gray', lw=1.5))
ax.text(1.5, -1.4, "Lower Scores for\nUnergatives", ha='center', color='gray', fontweight='bold')
# 6. Final Layout Polish
ax.set_yticks(range(len(labels)))
ax.set_yticklabels(labels, fontweight='bold', fontsize=12)
ax.set_xlabel('Posterior Beta Weight (Unaccusative vs. Unergative)', fontsize=13, labelpad=45)
# Lock limits so text and arrows don't shift
ax.set_xlim(-6, 3)
ax.set_ylim(-1.5, len(labels) - 0.5)
sns.despine(left=True, bottom=True)
plt.subplots_adjust(right=0.75, bottom=0.2) # Make room for text on right and guide on bottom
plt.savefig('./model_pyro.png', dpi=300, bbox_inches='tight')
plt.show()lets think about the results again. Yeah maybe 94% does not include but there is a more than a moderate change that accusatives are harder to process and also subject salience is decreased in unaccusatives.
Metric,Posterior β (Effect),Direction & Certainty Scene Verification (VLM),~ -2.3,Strong Negative Effect: The VLM consistently rates Unaccusative scenes lower. P(β<0)=0.99 indicates very high certainty. Full Scene (CLIP),~ -2.3,“Strong Negative Effect: Similar to the VLM, CLIP shows lower similarity for Unaccusative scenes. P(β<0)=0.92 is quite robust.” Subject Salience (CLIP),~ -1.5,“Moderate Negative Effect: The subject is slightly harder to identify in Unaccusative scenes, but the evidence is weaker (P=0.83) and the interval is much wider (more uncertainty).”
Even before we look at human data, the AI models are telling us: “These pictures aren’t equal.” The unaccusative scenes have a lower “visual-textual fit,” which means we must be careful not to mistake this perceptual “clutter” for a purely linguistic planning effect.
one thing this predict is that, the effects we are seeing can be partially due to hardship of the pictures.
However, given the general picture present in sentence production literature, this seems unlikely. Sauppe’s group found bunch of advance planning cases, where participant were faster to start speaking when they do not need to plan ahead for the verbal elements.
Similarly Momma and Yoshida shown advance planning in sentence-reall experiments. they shown that people were slower to start speaking when they say sentences such as ‘Which computer did you buy and repair?’ when there was a related verb to repairing.
important this only happened with ‘ATB’ type of sentences, and not parasitic gap sentences such as ‘Which computer did you repair after buying?’.
The Finding
Here’s what I found, which provides a fascinating, nuanced picture:
- CLIP Similarity: Unaccusative scenes had comparable or slightly higher similarity scores than unergative scenes. This means the visual information is definitely “there.”
- Subject Salience: The subject nouns were equally identifiable in both conditions (strong evidence against the idea that the character is hard to find).
- Qwen-VL-Chat Scores: Here’s the twist. The Qwen model gave significantly lower ratings to the unaccusative image-sentence pairs.
Wait, what does this divergence mean?
Subject Salience: The Baseline is Solid
The subject salience analysis is crucial. By measuring how well CLIP can match the subject noun alone (e.g., “octopus”, “ballerina”) to the image, we confirm that the characters are visually prominent.
The results show that subject nouns are equally identifiable across both verb types. This rules out the simplest form of perceptual difficulty: that participants just can’t find who is in the scene because they are occluded or small.
The CLIP vs. Qwen Split
The divergence between CLIP (similarity) and Qwen (explicit verification) is actually quite revealing.
- CLIP looks at the raw semantic match. It says, “Yes, this image contains the features of an octopus and boiling.”
- Qwen-VL-Chat, acting more like a human observer asked to make a judgment, says, “This is a bit of a weird way to describe this scene.”
This suggests that while the clear visual information is present (CLIP), the event integration might indeed be more cognitively demanding (Qwen), mirroring the human behavior we see in the experiment (longer onset latencies).
What This Means
This analysis serves as a powerful diagnostic for my experimental materials:
- ✅ Basic Visibility is Equal: Unaccusative subjects are not harder to see (CLIP Subject Salience).
- ✅ Information is Present: The overall semantic match is strong (CLIP Similarity).
- 🤔 Verification is Harder: The lower Qwen scores suggest intrinsic complexity in mapping these events to sentences.
This doesn’t invalidate the syntactic hypothesis. Instead, it helps us pinpoint where the difficulty lies. It’s not a low-level “I can’t see it” problem (which would be a confound). It’s likely a higher-level “conceptualizing this event is harder” issue—which might be exactly why unaccusative syntax is processed differently!
We can be confident that the onset latency effects aren’t due to bad drawings or hidden characters, but potentially reflect the genuine cognitive cost of encoding these specific types of events.
Broader Implications
I think this kind of analysis represents something really exciting about modern psycholinguistics. We’re not just running experiments and hoping for the best—we’re using computational tools to validate our materials in ways that weren’t possible even a few years ago.
Vision-language models like CLIP and multimodal LLMs like QWEN give us principled ways to ask: “Are these pictures doing what we think they’re doing?” The fact that we can now triangulate across different model architectures—similarity-based vs. generative—makes the validation even stronger. CLIP provides fast, quantitative similarity scores, while multimodal LLMs can provide more nuanced, human-interpretable ratings.
The convergence of evidence from both CLIP and multimodal LLMs provides a robust validation framework for experimental materials.
The Code (All Together)
- instead of the code here, add a ‘badge for colab’
If you want to run this analysis yourself, here’s the complete pipeline with both models:
import torch
import clip
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import re
# Load models
device = "cuda" if torch.cuda.is_available() else "cpu"
model_clip, preprocess = clip.load("ViT-B/32", device=device, jit=False)
# Monkey patch to fix ImportError in Qwen remote code
import transformers
if not hasattr(transformers, "BeamSearchScorer"):
try:
from transformers.generation import BeamSearchScorer
transformers.BeamSearchScorer = BeamSearchScorer
except ImportError:
try:
from transformers.generation.beam_search import BeamSearchScorer
transformers.BeamSearchScorer = BeamSearchScorer
except ImportError:
pass
model_vlm = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen-VL-Chat",
trust_remote_code=True,
dtype=torch.float32
).to('cpu')
tokenizer_vlm = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
streamer = TextStreamer(tokenizer_vlm, skip_prompt=True)
# Your data here (see above for structure)
# df_unerg = ...
# df_unacc = ...
# CLIP similarity function
def compute_clip_similarity(df, model, preprocess, device):
similarity_scores = []
for _, row in df.iterrows():
img = preprocess(Image.open(row['Filename'])).unsqueeze(0).to(device)
text = clip.tokenize([row['Sentence']]).to(device)
with torch.no_grad():
logits_per_image, _ = model(img, text)
similarity_scores.append(logits_per_image.item())
df_copy = df.copy()
df_copy['CLIP_Similarity'] = similarity_scores
return df_copy
# Qwen-VL scoring function
def compute_qwen_scores(df, model, tokenizer, streamer=None):
import re
scores = []
for _, row in df.iterrows():
# Create query for Qwen-VL-Chat
query = tokenizer.from_list_format([
{'image': row['Filename']},
{'text': f'Rate how well this sentence describes the image: "{row[\'Sentence\']}"\nScore from 1-10 (1=mismatch, 10=perfect match). Reply with just the number.'},
])
with torch.no_grad():
response, _ = model.chat(tokenizer, query=query, history=None, streamer=streamer)
match = re.search(r'(\d+(?:\.\d+)?)', response)
score = float(match.group(1)) if match else 5.0
score = min(10.0, max(1.0, score))
scores.append(score)
df_copy = df.copy()
df_copy['VLM_Score'] = scores
return df_copy
# Run analysis
df_unerg_clip = compute_clip_similarity(df_unerg, model_clip, preprocess, device)
df_unacc_clip = compute_clip_similarity(df_unacc, model_clip, preprocess, device)
df_unerg_vlm = compute_qwen_scores(df_unerg, model_vlm, tokenizer_vlm, streamer=streamer)
df_unacc_vlm = compute_qwen_scores(df_unacc, model_vlm, tokenizer_vlm, streamer=streamer)
# Compare
print("CLIP Results:")
print(f" Unergative mean: {df_unerg_clip['CLIP_Similarity'].mean():.2f}")
print(f" Unaccusative mean: {df_unacc_clip['CLIP_Similarity'].mean():.2f}")
print("\nQwen-VL-Chat Results:")
print(f" Unergative mean: {df_unerg_vlm['VLM_Score'].mean():.2f}")
print(f" Unaccusative mean: {df_unacc_vlm['VLM_Score'].mean():.2f}")Final Thoughts
This analysis didn’t change my theoretical interpretation of the experimental findings—but it made me much more confident in them. And that’s exactly what good methodological work should do.
If you’re running experiments with visual stimuli, I highly recommend giving this kind of analysis a try. Both CLIP and multimodal LLMs like Qwen2-VL are freely available, relatively easy to use, and can give you valuable insights into whether your materials are doing what you think they’re doing. The fact that you can now validate your stimuli using multiple computational approaches—from simple similarity scoring to sophisticated multimodal reasoning—provides unprecedented confidence in your experimental materials.
Plus, it’s just fun to see what these models “think” about your carefully crafted experimental stimuli. Sometimes they agree with each other and with you. Sometimes they surprise you. Either way, you learn something.
References
Momma, S., & Ferreira, V. (2019). Beyond linear order: The role of argument structure in speaking. Cognitive Psychology, 114, 101228.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning (pp. 8748-8763). PMLR.
Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., … & Zhou, J. (2023). Qwen-VL: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
Session Info
For reproducibility, here’s my setup:
import sys
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CLIP: (installed from https://github.com/openai/CLIP)")
print(f"Transformers: (for Qwen-VL-Chat)")Footnotes
However, an interesting sidenote is that we do not really know if human cognition is also propositional.↩︎
It works very slowly because they are extremely resource hungry. The reason this post waited this much was because I was waiting for results to come in.↩︎
There are of course other ways to test this. For example Griffin & Bock (2000) used a free-production task where participants were not given an initial word to use with the pictures. They quantified how many different words they used for each picture and named that variable ‘codability’ of the picture and tested if codability was related to onset latency. Egurtzegi et al. (2022) used a similar approach.↩︎